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Preface 


This book was originally designed for a three-week lecturing module on the 
principles ot Geographic Information Systems (GIS), to be taught to students in 
all education programmes at ITG as the second module in their course. 

A geographic information system is a computer-based system that supports the 
study of natural and man-made phenomena with an explicit location in space. 
To this end, the GIS allows data entry, data manipulation, and production of 
interpretable output that may provide new insights about the phenomena. 

There are many uses for GIS technology, including soil science; management 
of agricultural, forest and water resources; urban plaiming; geology; mineral 
exploration; cadastre and environmental monitoring. It is likely that the student 
reader of this textbook is already educated in one of these fields; the intention of 
the book is to lay the foundation for the reader to also become proficient in the 
use of GIS technology. 

With so many different fields of application, it is impossible to single out the 
specific techniques of GIS usage for all the fields in a single book. Rather, the 
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book focuses on a number of common and important topics that any expert GIS 
user should be aware of. GIS is a continuously evolving scientific discipline, 
and for this reason ITG students should be provided with a broad foundation of 
relevant concepts, techniques and technology 

The book is also meant to define a common understanding and terminology 
for follow-up modules, which the student may elect later in her/his respective 
programme. The textbook does not stand independently, but was developed in 
conjunction with the textbook on Principles of Remote Sensing. 
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Structure of this book 


The chapters of the book have been arranged in a semi-classical set-up. Chap¬ 
ters 1 to 3 provide a general introduction to the field, discussing various inter¬ 
esting geographic phenomena (Chapter 1), the ways these phenomena can be 
represented in a computer system (Chapter 2), and the data processing systems 
that are used to this end (Chapter 3). Spatial referencing and positioning (in¬ 
cluding map projections and GPS) is dealt with in Chapter 4. 

Chapters 5 to 7 subsequently focus on the process of using a GIS environment. We 
discuss how spatial data can be obtained, entered and prepared for use (Ghap- 
ter 5), how data can be manipulated to improve our understanding of the phe¬ 
nomena that they represent (Ghapter 6), and how the results of such manipula¬ 
tions can be visualized (Ghapter 7). Special attention throughout these chapters 
is devoted to the specific characteristics of geospatial data. 

Each chapter contains sections, a summary and some exercises. The exercises 
are meant to be a test of understanding of the chapter's contents; they are not 
practical exercises. They may not be typical exam questions either! Besides the 
regular chapters, the back part of the book contains a bibliography, a glossary, 
and an index. The book is also made available as an electronic PDF document 
which can be browsed but not printed. 
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Preface to the fourth edition 


This fourth edition of the GIS book is an update of the previous edition, with 
some reshuffling of the book's content and some minor changes to the layout 
(marginal editorial disagreements notwithstanding). Care has been taken to pro¬ 
vide updated material, improve readability and browse-ability of the text, and 
achieve greater integration through cross-referencing. 

Significant changes include a rewritten section on spatial referencing by Richard 
Knippers, restructuring of material in chapters 3, 4 and 5, and a range of edits 
for improved continuity throughout the chapters. A keyword-tn-the-margin 
layout was adopted to aid in browse-ability of the main text. 

It must be stressed that the design of this book remains that of a textbook on 
'principles'. A much bigger overhaul would have been required tor another 
format, and this was considered undesirable for its purpose, and infeasible in 
the time allowed, also because of dependencies with already developed teaching 
materials such as exercises and overhead slides. 

This textbook continues to be used at ITC in all educational programmes, as well 
as in other programmes around the globe that are developed in collaboration 
with ITC. A Korean translation of both textbooks has already been published, 
and other translation projects are under way. People with an interest in such an 
undertaking are invited to contact the editors. 

A book such as this will never be perfect, and the field of GIScience has not yet 
reached the type of maturity where debates over definitions and descriptions 
are no longer needed. The Editors welcome any comments and criticisms, in a 
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continued effort to improve the materials. 


Otto Huisman and Rolf A. de By, Enschede, July 2009 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















Chapter 1 

A gentle introduction to GIS 





previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 



25 




















1.1. The nature of GIS 


26 


1.1 The nature of GIS 


The purpose of this chapter is to set the scene for the remainder of this book by 
providing a general overview of some of the terms, concepts and ideas which 
will be covered in greater detail in later sections. 

The acronym GIS stands for geographic information system. As the name suggests, 
a GIS is a tool for working with geographic information. Section 1.1.2 provides 

a more formal definition, and later sections will look in more detail at some of Geographic information 
the key functions that set GIS apart from other kinds of information systems. system 

GIS have rapidly developed since the late 1970's in terms of both technical and 
processing capabilities, and today are widely used all over the world for a wide 
range of purposes. Let us begin by looking at some of these: 


• An urban planner might want to assess the extent of urban fringe growth 
in her/his city, and quantify the population growth that some suburbs are 
witnessing. S/he might also like to understand why these particular sub¬ 
urbs are growing and others are not; 

• A biologist might be interested in the impact of slash-and-burn practices on 
the populations of amphibian species in the forests of a mountain range to 
obtain a better understanding of long-term threats to those populations; 

• A natural hazard analyst might like to identify the high-risk areas of an¬ 
nual monsoon-related flooding by investigating rainfall patterns and ter¬ 
rain characteristics; 
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• A geological engineer might want to identify the best localities for construct¬ 
ing buildings in an earthquake-prone area by looking at rock formation 
characteristics; 

• A mining engineer could be interested in determining which prospective 
copper mines should be selected for future exploration, taking into account 
parameters such as extent, depth and quality of the ore body, amongst 
others; 

• A geoinformatics engineer hired by a telecommunications company may want 
to determine the best sites for the company's relay stations, taking into ac¬ 
count various cost factors such as land prices, undulation of the terrain et 
cetera; 

• A forest manager might want to optimize timber production using data on 
soil and current tree stand distributions, in the presence of a number of 
operational constraints, such as the need to preserve species diversity in 
the area; 

• A hydrological engineer might want to study a number of water quality pa¬ 
rameters of different sites in a freshwater lake to improve understanding 
of the current distribution of Typha reed beds, and why it differs from that 
of a decade ago. 

In the examples presented above, all the professionals work with positional data 
- also called spatial data. Spatial data refers to where things are, or perhaps, where 
they were or will be. To be more precise, these professionals deal with questions 
related to geographic space,which we define as having positional data relative to 
the Earth's surface. 
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Positional data of a non-geographic nature also exists. Examples include the 
location of the appendix in the human body, or the location of headlights on a 
car. these examples involve positional information, but it makes no sense to use 

the Earth's surface as a reference for these applications. Eor the purposes of this Geographic data 

book we are only interested in geographic data. To illustrate these issues further, 
the following section provides an example of the application of GIS to the study 
of global weather patterns. 
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1.1.1 Some fundamental observations 

Our world is dynamic. Many aspects of our daily lives and our environment 
are constantly changing, and not always for the better. Some of these changes 
appear to have natural causes (e.g. volcanic eruptions, meteorite impacts), while 
others are the result of human modification of the environment (e.g. land use 

changes or land reclamation from the sea, a favourite pastime of the Dutch). Dynamics and change 

There are also a large number of global changes for which the cause remains un¬ 
clear: these include global warming, the El Nino/La Nina events, or at smaller 
scales, landslides and soil erosion. In summary, we can say that changes to the 
Earth's geography can have natural or man-made causes, or a mix of both. If it is a 
mix of causes, we usually do not fully understand the changes. 

Eor background information on El Nino, please refer to Eigure 1.1. This Eig- 
ure presents information related to a study area (the equatorial Pacific Ocean), 
with positional data taking a prominent role. Although quite a complex phe¬ 
nomenon, we will use the study of El Nino as an example application of GIS in 
the remainder of this chapter. 

In order to understand what is going on in our world, we study the processes 
or phenomena that bring about geographic change. In many cases, we want to 
broaden or deepen our understanding to help us make decisions, so that we can 

take the best course of action. Eor instance, if we understand El Nino better, and Geographic phenomena 
can forecast that another event may take place in the year 2012, we can devise 
an action plan to reduce the expected losses in the fishing industry, to lower the 
risks of landslides caused by heavy rains or to build up water supplies in areas 
of expected droughts. 
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El Nino is an aberrant pattern in weather and sea water temperature that occurs with some frequency (every 
4-9 nine years) in the Pacific Ocean along the Equator. It is characterized by less strong western winds 
across the ocean, less upwelling of cold, nutrient-rich, deep-sea water near the South American coast, and 
therefore by substantially higher sea surface temperatures (see figures below). It is generally believed that 
El Nino has a considerable impact on global weather systems, and that it is the main cause for droughts in 
Wallacea and Australia, as well as for excessive rains in Peru and the southern U.S.A. 

El Nino means ‘little boy’, and manifests itself usually around Christmas. There exists also another—less 
pronounced-pattern of colder temperatures, that is known as La Nina (‘little girl’) which occurs less frequently 
than El Nino. The most recent occurrence of El Nino started in September 2006 and lasted until early 2007 
From June 2007 on, data indicated a weak La Nina event, strengthening in early 2008. The figures below left 
illustrate an extreme El Nino year (1997; considered to be the most extreme of the twentieth century) and a 
subsequent La Nina year (1998). 

Left figures are from December 1997, an extreme El Nino event; right figures are of the subsequent year, 
indicating a La Nina event. In all figures, colour is used to indicate sea water temperature, while arrow lengths 
indicate wind speeds. The top figures provide information about absolute values, while the bottom figures 
are labelled with values relative to the average situation for the month of December. The bottom figures also 
give an indication of wind speed and direction. See also Figure 1.3 for an indication of the area covered by 
the array of buoys. 


Upper figures: absolute values of average SST [°C] and WS [m/s] 





Lower figures: differences with normal situation 
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Figure 1.1: The El Nino 
event of 1997 compared 
with a more normal year 
1998. The top figures 
indicate average Sea Sur¬ 
face Temperature (SST, in 
colour) and average Wind 
Speed (WS, in arrows) 
for the month of Decem¬ 
ber. The bottom figures 
illustrate the anomalies 
(differences from a normal 
situation) in both SST 
and WS. The island in 
the lower left corner is 
(Papua) New Guinea with 
the Bismarck Archipelago. 
Latitude has been scaled 
by a factor two. Data 
source: National Oceanic 
and Atmospheric Ad¬ 
ministration, Pacific 

Marine Environmental 
Laboratory, Tropical At¬ 
mosphere Ocean project 
(NOAA/PMEL/TAO). 
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The fundamental problem that we face in many uses of GIS is that of under¬ 
standing phenomena that have a spatial or geographic dimension, as well as a tem¬ 
poral dimension. We are facing 'spatio-temporal' problems. This means that our 
object of study has different characteristics for different locations (the geographic 
dimension) and also that these characteristics change over time (the temporal di¬ 
mension). The El Nino event is a good example of such a phenomenon, because 
sea surface temperatures differ between locations, and sea surface temperatures 
change from one week to the next. 
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1.1.2 Defining GIS 

The previous section illustrated the use of GIS in a range of seffings to operate on 
data that represent geographic phenomena. This provides us with a functional 
definition (after Aronoff [3]): 


A GIS is a computer-based system that provides the following four sets of 
capabilities to handle georeferenced data: 

1. Data capture and preparation 

2. Data management, including storage and maintenance 

3. Data manipulation and analysis 

4. Data presentation 


This implies that a GIS user can expect support from the system to enter (geo¬ 
referenced) dafa, fo analyse it in various ways, and to produce presentations 
(including maps and other types) from the data. This would include support for 
various kinds of coordinafe systems and transformations between them, options 
for analysis of the georeferenced data, and obviously a large degree of freedom 
of choice in the way this information is presented (such as colour scheme, sym¬ 
bol set, and medium used). 

For examples of each of fhese capabilities, let us take a closer look at the El Nino 
example. Many professionals closely sfudy fhis phenomenon, mosf nofably me- 
teorologisfs and oceanographers. They prepare all sorfs of producfs, such as fhe 
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maps of Figure 1.1, in order to improve their understanding. To do so, they need 
to obtain data about the phenomenon, which, as shown above, includes mea¬ 
surements about sea water temperature and wind speed from many locations. 
This data must be stored and processed to enable it to be analysed, and allow 
the results from the analysis to be interpreted. The way this data is presented 
could play an important role in its interpretation. 

We have listed these capabilities above in the most natural order in which they 
take place. But this is only a sketch of an ideal situation, and it is often the case 
that data analysis suggests that we need more data about the problem. Data pre¬ 
sentation may also lead to follow-up questions for which we need to do more 
analysis, and for which we may need more data, or perhaps better data. Conse¬ 
quently, several of the steps may be repeated a number of times before we are 
happy with the results. We look into these steps in more detail below, in the 
context of the El Nino example. 
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Data capture and preparation 

In the El Nino case, data capture refers to the collection of sea water tempera¬ 
tures and wind speed measurements. This is achieved by placing buoys with 
measuring equipment at various places in the ocean. Each buoy measures a 
number of things: wind speed and direction; air temperature and humidity; and 
sea water temperature at the surface and at various depths down to 500 metres. 
Eor the sake of our example we will focus on sea surface temperature (SST) and 
wind speed (WS). 

A typical buoy is illustrated in Eigure 1.2, which shows the placement of various 
sensors on the buoy. Eor monitoring purposes, some 70 buoys were deployed 
at strategic places within 10° latitude of the Equator, between the Galapagos 
Islands and Papua New Guinea. Eigure 1.3 provides a map that illustrates the 
positions of these buoys. The buoys have been anchored, so they are stationary. 
Occasional malfunctioning is caused by high seas and bad weather or by the 
buoys becoming entangled in long-line fishing nets.^ 

All the data that a buoy obtains through its thermometers and other sensors, as 
well as the buoy's geographic position are transmitted by satellite communica¬ 
tion daily. Later in this book, and also in the textbook on Principles of Remote 
Sensing [53], many other ways of acquiring geographic data will be discussed. 


^As Figure 1.3 shows, there happen to be three types of buoy, but their differences are nof 
direcfly relevanf fo our example, so we will ignore fhem here. 
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WS sensor ^ 
humidity sensor ^ 

Torroidal buoy 0 2.3 m 

3/8” wire rope 

500 m 

3/4” nylon rope 

acoustic release 

anchor 4200 lbs 



Argos antenna 
3.8 m above sea 

data logger 


SST sensor 
temperature sensors 

sensor cable 

temperature sensor 


Figure 1.2: Schematic 

overview of an ATLAS 
type buoy for monitoring 
sea water temperatures in 
the Ei Nino project 
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Figure 1.3: The array 
of positions of sea sur¬ 
face temperature and wind 
speed measuring buoys 
in the equatorial Pacific 
Ocean 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 


bibliography 


about 
























1.1. The nature of GIS 


37 


Data management 

For our example application, data management refers to the storage and main¬ 
tenance of the data transmitted by the buoys via satellite communication. This 
phase requires a decision to be made on how best to represent our data, both in 
terms of their spatial properties and the various attribute values which we need 
to store. Data storage and maintenance is discussed at length in Chapter 3, and 
we will not go into further detail here. We will from here on assume that the 
acquired data has been put in digital form, that is, it has been converted into 
computer-readable format, so that we can begin our analysis. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















1.1. The nature of GIS 


38 


Data manipulation and analysis 

Once the data has been collected and organized in a computer system, we can 
start analysing it. Here, let us look at what processes were involved in the even¬ 
tual production of the maps of Figure 1.1. Note that the actual production of 
maps belongs to the phase of data presentation that we discuss below. 

Here, we look at how data generated at the buoys was processed before map 
production. A closer look at Figure 1.1 reveals that the data being presented are 
based on the monthly averages for SST and WS (for two months), not on single 
measurements for a specific date. Moreover, the two lower figures provide com¬ 
parisons with 'the normal situation', which probably means that a comparison 
was made with the December averages of several years. 

The initial (buoy) data have been generalized from 70 point measurements (one 
for each buoy) to cover the complete study area. Clearly, for positions in the 

study area for which no data was available, some type of interpolation took Sample measurements 
place, probably using data of nearby buoys. This is a typical GIS function: de¬ 
riving an estimated value for a property for some location where we have not 
measured. 

It appears that the following steps took place for the upper two figures (here 
we look at SST computations only—WS analysis will have been similarly con¬ 
ducted): 

1. For each buoy, the average SST for each month was computed, using the 
daily SST measurements for that month. This is a simple computation. 

2. For each buoy, the monthly average SST was taken together with the geo- 
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graphic location, to obtain a georeferenced list of averages, as illustrated in 
Table 1.1. 

3. From this georeferenced list, through a method of spatial interpolation, the 
estimated SST of other positions in the study are were computed. This 
step was performed as often as needed, to obtain a fine mesh of positions 
with measured or estimated SSTs from which the maps of Figure 1.1 were 
eventually derived. 

4. We assume that previous to the above steps we had obtained data about 
average SST for the month of December for a series of years. This foo may 
have been spatially interpolated to obtain a 'normal situation' December 
data set of a fine resolution. 

Let us first clarify what is meant by a 'georeferenced' list. Data is georeferenced 
if if is associafed with some position on the Earth's surface, by using a spatial 
reference system. This can be achieved using (longitude, latitude) coordinates, 

or by other means that we discuss in Chapter 4. The key issue is that there is Georeferenced data 

some kind of coordinate system as a reference. In our list, we have associated 
average sea surface temperature observations with spatial locations, and thereby 
we have georeferenced them. 

In step 3 above, we mentioned spatial interpolation. To understand this issue, 
it is important to note that sea surface femperature is a properfy that occurs 
everywhere in the ocean, and not only at the buoys where measurements are 
taken. The buoys only provide a set of sample observations of sea surface tem- 

perature.We can use these sample measurements to estimate the value of SST in Spatial interpolation 

places where we have not measured it, using a technique called spatial interpola¬ 
tion. The theory of spatial interpolation is extensive, but this is not the place to 
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Buoy 

Geographic position Dec. 1997 avg. SST 

B0789 

B7504 

B1882 

(165°E, 5°N) 28.02 °C 

(180°E, 0°N) 27.34 °C 

(110° W, 7°30’S) 25.28 °C 


Table 1.1 : The georefer- 
enced list (in part) of av¬ 
erage sea surface tem¬ 
peratures obtained for the 
month December 1997. 


discuss it. There are in fact many different spatial interpolation techniques, not 
just one, and some are better in specific situations than others. This is however 
a typical example of functions that a GIS can perform on user data. 
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Data presentation 

After the data manipulations discussed above, our data is prepared for produc¬ 
ing output. In this case, the maps of Figure 1.1. The data presentation phase 
deals with putting it all together into a format that communicates the result of 
data analysis in the best possible way. 

Many issues arise in this phase. Among other things, we need to consider what 
the message is that we want to portray, who the audience is, what kind of pre¬ 
sentation medium will be used, which rules of aesthetics apply, and what tech¬ 
niques are available for representation. These issues may sound a little abstract, 
so let us clarify with the El Nino case. 

For Figure 1.1, we can make the following statements: 

• The message we wanted to portray is what are the El Nino and La Nina 
events, both in absolute figures, but also in relative figures, i.e. as differ¬ 
ences from a normal situation. 

• The audience for this data presentation clearly were the readers of this text 
book, i.e. students of ITC who want to obtain a better understanding of 
GIS. 

• The medium was this book, (printed matter of A4 size) and possibly a web¬ 
site. The book's typesetting imposes certain restrictions, like maximum 
size, font style and font size. 

• The rules of aesthetics demanded many things: the maps should be printed 
north-up; with clear georeferencing; with intuitive use of symbols et cetera. 
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We actually also violated some rules of aesthetics, for instance, by applying 
a different scaling factor in latitude (horizontally) compared to longitude 
(vertically). 

• The techniques that we used included the use of a colour scheme and iso- 
lines,^ plus a number of other techniques. 


^Isolines are discussed in Chapter 2. 
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1.1.3 GISystems, GIScience and GIS applications 

The previous discussion defined a geographic information system — in the 'nar¬ 
row' sense—in terms of its functions as as a computerized system that facilitates 
the phases of data entry, data management, data analysis and data presenta¬ 
tion specifically for dealing with georeferenced data. In the 'wider' sense, a 
functioning GIS requires both hardware and software, and also people such as 
the database creators or administrators, analysts who work with the software, 
and the users of the end product. For the purposes of this book we will con¬ 
cern ourselves with the 'narrow' definition, and focus on the specifics of these 
so-called GISystems. 

The discipline that deals with all aspects of the handling of spatial data and 
geoinformation is called geographic information science (often abbreviated to geo¬ 
information science or just GIScience). 


Geo-Information Science is the scientific field that attempts to integrate dif¬ 
ferent disciplines studying the methods and techniques of handling spatial 
information. 


Related terms include geoinformatics, geomatics, and spatial information science. 
These are all similar terms which have much the same meaning, although each 
approach has slight differences in the way it deals with problems, some empha¬ 
sizing engineering approaches, others computational solutions, and so on. 

As well as being aware of these differences, it is also important to be aware of 
the difference between a geographic information system and and a GIS applica¬ 
tion. In the example discussed above (determining sea water temperatures of 
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the El Nino event in two subsequent December months). The same software GIS applications 

package that we used to do this analysis could also be used to analyse forest 
plots in northern Thailand, for instance. That would be a different application, 
but would make use of the same software. GIS software can (generically) be ap¬ 
plied to many different applications. When there is no risk of ambiguity, people 
sometimes do not make the distinction between a 'GIS' and a 'GIS application'. 

Project-based GIS applications usually have a clear-cut purpose, and these appli¬ 
cations can be short-lived: the research is carried out by collecting data, entering 
data in the GIS, analysing the data, and producing informative maps. An ex¬ 
ample is rapid earthquake damage assessment. Institutional GIS applications, 
on the other hand, usually have as their goal the continued administration of 
spatial change and the sustained availability of spatial base data. Their needs 
for advanced data analysis are usually less, and the complexity of these appli¬ 
cations lies more in the continued provision of trustworthy data to others. They 
are thus long-lived applications. An obvious example are automated cadastral 
systems. 
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1.1.4 Spatial data and geoinformation 

A subtle difference exists between the terms data and information. Most of the 
time, we use the two terms almost interchangeably, and without the risk of con¬ 
fusing their meanings. Occasionally, however, we need to be precise about ex¬ 
actly what it is we are referring to, and in this situation their distinction does 
matter. 

By data, we mean representations that can be operated upon by a computer. 

More specifically, by spatial data we mean data that contains positional values, 
such as {x, y) co-ordinates. Sometimes the more precise phrase geospatial data is 
used as a further refinement, which refers to spatial data that is georeferenced. 

In this book, we will use 'spatial data' as a synonym for 'georeferenced data'. By Geospatial data and 

information, we mean data that has been interpreted by a human being. Humans geoinformation 

work with and act upon information, not data. Human perception and mental 
processing leads to information, and hopefully understanding and knowledge. 

Geoinformation is a specific type of information resulting from the interpretation 
of spatial data. 

As this information is intended to reduce uncertainty in decision-making, any 
errors and uncertainties in spatial information products may have practical, fi¬ 
nancial and even legal implications for the user. For these reasons, it is important 
that those involved in the acquisition and processing of spatial data are able to 

assess the quality of the base data and the derived information products. The In- Data quality considerations 

temational Standards Organization (ISO) considers quality to be "the totality of 

characteristics of a product that bear on its ability to satisfy a stated and implied 

need" (Godwin, 1999). The extent to which errors and other shortcomings of a 

data set affect decision making depends on the purpose for which the data is to 
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be used. For this reason, quality is often defined as 'fitness for use'. 

Traditionally, most spatial data were collected and held by individual, special¬ 
ized organizations. In recent years, increasing availability and decreasing cost 
of data capture equipment has resulted in many users collecting their own data. 

However, the collection and maintenance of 'base' data remain the responsi¬ 
bility of the various governmental agencies, such as National Mapping Agen¬ 
cies (NMAs), which are responsible for collecting topographic data for the en¬ 
tire country following pre-set standards. Other agencies such as geological sur- Base data, sharing and 
vey companies, energy supply companies, local government departments, and metadata 

many others, all collect and maintain spatial data for their own particular pur¬ 
poses. If data is to be shared among different users, these users need to know 
not only what data exists, where and in what format it is held, but also whether 
the data meets their particular quality requirements. This 'data about data' is 
known as metadata. 

Since the real power of GIS lies in their ability to combine and analyse georefer- 
enced data from a range of sources, we must pay attention to the issues of data 
quality and error, as data from different sources are also likely to contain differ¬ 
ent kinds of error. This may include mistakes or variation in the measurement 
of position and/or elevation, in the quantitative measurement of attributes or 

in the labelling or classification of features. Some degree of error is present in Error in spatial data 

every spatial data set. It is important, however, to distinguish between gross 

errors (blunders or mistakes), which must be detected and removed before the 

data is used, variations in the data caused by unavoidable measurement and 

classification errors. 

It is possible to make a further distinction between errors in the source data and 
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processing errors resulting from spatial analysis and modelling operations carried 
out by the system on the base data. The nature of positional errors that can arise 
during data collection and compilation, including those occurring during digital 
data capture, are generally well understood, and a variety of tried and tested 
techniques is available to describe and evaluate them (see Section 5.2). 

Key components of spatial data quality include positional accuracy (both horizon¬ 
tal and vertical), temporal accuracy (that the data is up to date), attribute accuracy 

(e.g. in labelling of features or of classifications), lineage (history of the data in- Data quality parameters 
eluding sources), completeness (if the data set represents all related features of 
reality), and logical consistency (that the data is logically structured). 

These components play an important role in assessment of data quality for sev¬ 
eral reasons: 

1. Even when source data, such as official topographic maps, have been sub¬ 
ject to stringent quality control, errors are introduced when these data are 
input to GIS. 

2. Unlike a conventional map, which is essentially a single product, a GIS 
database normally contains data from different sources of varying quality. 

3. Unlike topographic or cadastral databases, natural resource databases con¬ 
tain data that are inherently uncertain and therefore not suited to conven¬ 
tional quality control procedures. 

4. Most GIS analysis operations will themselves introduce errors. 
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1.2 The real world and representations of it 


One of the main uses of GIS is as a tool to help us make decisions. Specifically, 
we often want to know the best location for a new facility, the most likely sites 
for mosquito habitat, or perhaps identify areas with a high risk of flooding so 
that we can formulate the best policy for prevention. In using GIS to help make 
these decisions, we need to represent some part of the real world as it is, as it 
was, or perhaps as we think it will be. We need to restrict ourselves to 'some 
part' of the real world simply because it caimot be represented completely. 

The El Nino system discussed earlier in this chapter has as its purpose the ad¬ 
ministration of SST and WS in various places in the equatorial Pacific Ocean, and 
to generate georeferenced, monthly overviews from these. If this is its complete 
purpose, the system does not need to store data about the ships that moored the 
buoys, the manufacture date of the buoys et cetera. All this data is irrelevant tor 
the purpose of the system. 

The fact that we can only represent parts of the real world teaches us to be hum¬ 
ble about the expectations that we can have about the system: all the data it can 
possibly generate for us in the future will be based upon the information which 
we provide the system with. Often, we are dealing with processes or phenom¬ 
ena that change rapidly, or which are difficult to quantify in order to be stored 
in a computer. It follows that the ways we collect, organise and structure data 
from the real world plays a key part in this process. 

If we have done our job properly, a computer representation of some part of the 
real world, will allow us to enter and store data, analyse the data and transfer it 
to humans or to other systems. 
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1.2.1 Models and modelling 

'Modelling' is a term used in many different ways and which has many different 
meanings. A representation of some part of the real world can be considered 
a model because the representation will have certain characteristics in common 
with the real world. Specifically, those which we have identified in our model 
design. This then allows us to study and operate on the model itself instead of 
the real world in order to test what happens under various conditions, and help 
us answer 'what if' questions. We can change the data or alter the parameters of 
the model, and investigate the effects of the changes. 

Models—as representations—come in many different flavours. In the GIS envi¬ 
ronment, the most familiar model is that of a map. A map is a miniature repre¬ 
sentation of some part of the real world. Paper maps are the most common, but 
digital maps also exist, as we shall see in Chapter 7. We will look more closely 
at maps below. Databases are another important class of models. A database 
can store a considerable amount of data, and also provides various functions to 

operate on the stored data. The collection of stored data represents some real Models as representations 
world phenomena, so it too is a model. Obviously, here we are especially inter¬ 
ested in databases that store spatial data. Digital models (as in a database or GIS) 
have enormous advantages over paper models (such as maps). They are more 
flexible, and therefore more easily changed for the purpose at hand. In princi¬ 
ple, they allow animations and simulations to be carried out by the computer 
system. This has opened up an important toolbox that can help to improve our 
understanding of the world. 

The attentive reader will have noted our threefold use of the word 'model'. This, 
perhaps, may be confusing. Except as a verb, where it means 'to describe' or 'to 
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represent', it is also used as a noun. A 'real world model' is a representation of 

a number of phenomena that we can observe in reality, usually to enable some Application models 

type of study, administration, computation and/or simulation. In this book we 

will use the term application models to refer to models with a specific application, 

including real-world models and so-called analytical models. The phrase 'data 

modelling' is the common name for the design effort of structuring a database. 

This process involves the identification of the kinds of data that the database will 
store, as well as the relationships between these kinds of data. We discuss these 
issues further in Chapter 3. 

Most maps and databases can be considered static models. At any point in time, 
they represent a single state of affairs. Usually, developments or changes in 
the real world are not easily recognized in these models. Dynamic models or - 

process models address precisely this issue. They emphasize changes that have Dynamic models 

taken place, are taking place or may take place sometime in the future. Dynamic 

models are inherently more complicated than static models, and usually require 

much more computation. Simulation models are an important class of dynamic 

models that allow the simulation of real world processes. 

Observe that our El Nino system can be called a static model as it stores state-of- 
affairs data such as the average December 1997 temperatures. But at the same 
time, it can also be considered a simple dynamic model, because it allows us to 
compare different states of affairs, as Figure 1.1 demonstrates. This is perhaps 
the simplest form of dynamic model: a series of 'static snapshots' allowing us 
to infer some information about the behaviour of the system over time. We will 
return to modelling issues in Chapter 6. 
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1.2.2 Maps 

As noted above, maps are perhaps the best known (conventional) models of the 
real world. Maps have been used for thousands of years to represent information 
about the real world, and continue to be extremely useful for many applications 
in various domains. Their conception and design has developed into a science 
with a high degree of sophistication. A disadvantage of the traditional paper 
map is that it is generally restricted to two-dimensional static representations, 

and that it is always displayed in a fixed scale. The map scale determines the Map scale and accuracy 

spatial resolution of the graphic feature representation. The smaller the scale, 

the less detail a map can show. The accuracy of the base data, on the other 

hand, puts limits to the scale in which a map can be sensibly drawn. Hence, the 

selection of a proper map scale is one of the first and most important steps in 

map design. 

A map is always a graphic representation at a certain level of detail, which is 
determined by the scale. Map sheets have physical boundaries, and features 
spaiming two map sheets have to be cut into pieces. Cartography, as the science 

and art of map making, functions as an interpreter, translating real world phe- Cartography 

nomena (primary data) into correct, clear and understandable representations 
for our use. Maps also become a data source for other applications, including 
the development of other maps. 

With the advent of computer systems, analogue cartography developed into dig¬ 
ital cartography, and computers play an integral part in modem cartography. 

Alongside this trend, the role of the map has also changed accordingly, and the 

dominance of paper maps is eroding in today's increasingly 'digital' world. Digital maps 

The traditional role of paper maps as a data storage medium is being taken over 
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by (spatial) databases, which offer a number of advantages over 'static' maps, 
as discussed in the sections that follow. Notwithstanding these developments, 
paper maps remain as important tools for the display of spatial information for 
many applications. 
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1.2.3 Databases 

A database is a repository for storing large amounts of data. It comes with a 
number of useful functions: 

1. A database can be used by multiple users at the same time—i.e. it allows 
concurrent use, 

2. A database offers a number of techniques for storing data and allows the 
use of the most efficient one—i.e. it supports storage optimization, 

3. A database allows the imposition of rules on the stored data; rules that will 
be automatically checked after each update to the data—i.e. it supports data 
integrity, 

4. A database offers an easy to use data manipulation language, which allows 
the execution of all sorts of data extraction and data updates—i.e. it has a 
query facility, 

5. A database will try to execute each query in the data manipulation lan¬ 
guage in the most efficient way—i.e. it offers query optimization. 

Databases can store almost any kind of data. Modem database systems, as we 
shall see in Section 3.4, organize the stored data in tabular format, not unlike 
that of Table 1.1. A database may have many such tables, each of which stores 
data of a certain kind. It is not uncommon for a table to have many thousands 
of data rows, sometimes even hundreds of thousands. For the El Nino project, 
one may assume that the buoys report their measurements on a daily basis and 
that these measurements are stored in a single, large table. 
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DayMeasurements 


Buoy 

Date 

SST 

WS 

Humid 

TempIO ... 

B0749 

1997/12/03 

28.2 °C 

NNW4.2 

72% 

22.2 °C ... 

B9204 

1997/12/03 

26.5 °C 

NW4.6 

63% 

20.8 °C ... 

B1686 

1997/12/03 

27.8 °C 

NNW3.8 

78% 

22.8 °C ... 

B0988 

1997/12/03 

27.4 °C 

N 1.6 

82% 

23.8 °C ... 

B3821 

1997/12/03 

27.5 °C 

W3.2 

51% 

20.8 °C ... 

B6202 

1997/12/03 

26.5 °C 

SW4.3 

67% 

20.5 °C ... 

B1536 

1997/12/03 

27.7 °C 

SSW 4.8 

58% 

21.4 °C ... 

B0138 

1997/12/03 

26.2 °C 

W 1.9 

62% 

21.8 °C ... 

B6823 

1997/12/03 

23.2 °C 

S3.6 

61% 

22.2 °C ... 


Table 1.2: A stored ta¬ 
ble (in part) of daily buoy 
measurements. Illustrated 
are only measurements 
for December 3rd, 1997, 
though measurements for 
other dates are in the ta¬ 
ble as well. Humid is 
the air humidity just above 
the sea. Tempi0 is the 
measured water tempera¬ 
ture at 10 metres depth. 
Other measurements are 
not shown. 


The entire El Nino buoy measurements database is likely to have more tables 
than the one illustrated. There may be data available about the buoys' main¬ 
tenance and service schedules; there may also be data about the gauging of the 
sensors on the buoys, possibly including expected error levels. There will almost 
certainly be a table that stores the geographic location of each buoy. 

Table 1.1 was obtained from table DayMeasurements through the use of a 
query language. A query was defined that computes the monthly average SST 
from the daily measurements, for each buoy. A discussion of the particular 
query language that was used is outside the scope of this book, but we should 
mention that the query was a simple program with just four lines of code. 
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1.2.4 Spatial databases and spatial analysis 

A GIS must store its data in some way. For this purpose the previous generation 
of software was equipped with relatively rudimentary facilities. Since the 1990's 
there has been an increasing trend in GIS applications that used a GIS for spatial 
analysis, and used a database for storage. In more recent years, spatial databases 
(also known as geodatabases) have emerged. Besides traditional administrative 
data, they can store representations of real world geographic phenomena for 
use in a GIS. These databases are special because they use additional techniques 
different from fables to store these spatial representations. 

A geodatabase is not the same thing as a GIS, though both systems share a num¬ 
ber of characteristics. These include the functions listed above for databases in 
general: concurrency, storage, integrity, and querying, specifically, but not only, 

spatial data. A GIS, on the other hand, is tailored to operate on spatial data. It Geodatabases 

'knows' about spatial reference sysfems, and supporfs all kinds of analyses that 
are inherently geographic in nature, such as distance and area computations and 
spatial interpolation. This is probably GIS's main strength: providing various 
ways to combine representations of geographic phenomena. GISs, moreover, 
built-in tools for map production, of the paper and the digital kind. They oper¬ 
ate with an 'embedded understanding' of geographic space. Databases typically 
lack this kind of understanding. 

The phenomena for which we want to store representations in a spatial database 
may have point, line, area or image characteristics. Different storage techniques 
exist for each of these kinds of spatial data.^ These geographic phenomena have 

various relationships with each other and possess spatial (geometric), thematic Representations of 

— r - geographic phenomena 

^Since we also have different analytical techniques for these different types of data, an im- 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 






















1.2. The real world and representations of it 


56 


and temporal attributes (they exist in space and time). For data management 
purposes, phenomena are classified into thematic data layers. The purpose of 
the database is usually described by a description such as cadastral, topographic, 
land use, or soil database. 

Spatial analysis is the generic term for all manipulations of spatial data carried 
out to improve one's understanding of the geographic phenomena that the data 
represents. It involves questions about how the data in various layers might re¬ 
late to each other, and how it varies over space. For example, in the El Nino case, 
we may want to identify the the steepest gradient in water temperature. The aim 

of spatial analysis is usually to gain a better understanding of geographic phe- Spatial analysis 

nomena through discovering patterns that were previously unknown to us, or to 

build arguments on which to base important decisions. It should be noted that 

some GIS functions for spatial analysis are simple and easy-to-use, others are 

much more sophisticated, and demand higher levels of analytical and operating 

skills. Successful spatial analysis requires appropriate software, hardware, and 

perhaps most importantly, a competent user. 


portant choice in the design of a spatial database application is whether some geographic phe¬ 
nomenon is better represented as a point, as a line, or as an area. Currently, spatial databases 
support the storage of image data, but that support still remains relatively limited. 
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1.3 Structure of this book 


This chapter has attempted to provide a 'gentle' introduction to GIS. It has dis¬ 
cussed the nature of GIS tools and GIS as a field of scientific research. Much of 
the technical detail has been intentionally left out in favour of a broader discus¬ 
sion of the key issues relating to both of these topics. The chapter has looked 
at the purposes of GIS and identified understanding objects and events in geo¬ 
graphic space as the common thread amongst GIS applications, and that spatial 
data and spatial data processing are key factors in this understanding. A simple 
example of a study of the EL Nino effect provided an illustration, without the 
technical details. 

It was noted that the use of GIS commonly takes place in several phases: data 
capture and preparation, storage and maintenance, manipulation and analysis, 
and data presentation. Before we get to discussing these phases, the following 
two chapters provide more discussion on important background concepts and 
issues. In Ghapter 2, we will focus the discussion on different kinds of geo¬ 
graphic phenomena and their representation in a GIS, and discuss appropriate 
instances of when to use which. Ghapter 3 is devoted to a discussion of data pro¬ 
cessing systems for spatial data, namely, GIS, databases and spatial databases. 

Following these last two chapters, the remaining structure of the book follows 
the phases identified above. In Chapter 5 we look at the phase of data entry and 
preparation: how to ensure that the (spatial) data is correctly entered into the 
GIS, such that it can be used in subsequent analysis. Analysis of geoinformation 
is the focus of Chapter 6. It discusses the most important forms of spatial data 
analysis in some detail, and looks at issues related to spatial modelling. 
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The phase of data visualization is the topic of Chapter 7. This chapter deals with 
fundamental cartographic principles: what to put on a map, where to put it, and 
what techniques to use for specific types of data. Sooner or later, almost all CIS 
users will be involved the presentation of geoinformation (usually of maps), so 
it is important to understand the underlying principles. 
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Questions 

1. Take another look at the list of professions provided on page 26. Give 
two more examples of professions fhaf people are trained in at ITC, and 
describe a possible relevant problem in their 'geographic space'. 

2. In Section 1.1.1, some examples are given of changes fo the Earth's geog¬ 
raphy. They were categorized in three types: natural changes, man-made 
changes and a combination of the two. Provide additional examples of each 
category. 

3. What kind of professionals, do you think, were involved in the Tropical 
Atmosphere Ocean project of Figure 1.1? Hypothesize about how they 
obtained the data to prepare the illustrations of thaf figure. How do you 
think they came up with the nice colour maps? 

4. Use arguments obtained from Figure 1.1 to explain why 1997 was an El 
Nino year, and why 1998 was not. Also explain why 1998 was in fact a Fa 
Nina year, and not an ordinary year. 

5. On page 37, we made the observation that we would assume the data that 
we talk about to have been put into a digital format, so that computers 
can operate on them. But often, useful dafa has not been converted in this 
way. From your own experience, provide examples of dafa sources in non- 
digifal formaf. 

6. Assume the El Nino project is operating with just four buoys, and not 70, 
and their location is as illustrated in Figure 1.4. We have already computed 
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Figure 1.4: Just four mea¬ 
suring buoys 


the average SSTs for the month December 1997, which are provided in the 
table below. Answer the following questions: 

• What is the expected average SST of the illustrated location that is 
precisely in the middle of the four buoys? 

• What can be said about the expected SST of the illustrated location 
that is closer to buoy B0341? Make an educated guess at the tempera¬ 
ture that could have been observed there. 


Buoy 

Position SST 

B0341 

B0871 

B8391 

B9033 

(160° W, 6° N) 30.18 °C 
(180° W, 6° N) 28.34 °C 
(180° W, 6° S) 25.28 °C 
(160° W, 6° S) 28.12 °C 
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7. In Table 1.2, we illustrated some stored measurement data. The table uses 
one row of data for a single day that some buoy reports its measurements. 
How many rows do you think the table will store after a full year of projecf 
execution? 

The table does not store the geographic location of the buoy involved. Why 
do you think it doesn't do that? How do you think these locations are 
stored? 
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2.1 Models and representations of the real world 

As discussed in the previous chapter, we use GISs to help analyse and under¬ 
stand more about processes and phenomena in the real world. Section 1.2.1 re¬ 
ferred to the process of modelling, or building a representation which has certain 
characteristics in common with the real world. In practical terms, this refers to 
the process of representing key aspects of the real world digitally (inside a com¬ 
puter). These representations are made up of spatial data, stored in memory 
in the form of bits and bytes, on media such as the hard drive of a computer. 
This digital representation can then be subjected to various analytical functions 
(computations) in the GIS, and the output can be visualized in various ways. 


Modelling is the process of producing an abstraction of the ‘real world’ so 
that some part of it can be more easily handled. 


Depending on the application domain of the model, it may be necessary to ma¬ 
nipulate the data with specific techniques. To investigate the geology of an area, 
we may be interested in obtaining a geological classification. This may result in 
additional computer representations, again stored in bits and bytes. To examine 
how the data is stored inside the GIS, one could look into the actual data files, 
but this information is largely meaningless to a normal user. 

As highlighted in in Figure 2.1, the process of translating the relevant aspects 
of the real world into a computer representation of it is a domain of expertise 
by itself. It might be achieved through direct observations using sensors, and 
digitizing (converting) the sensor output for computer usage. This is the domain 
of remote sensing, the topic of Principles of Remote Sensing [53]. We may also do 
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this by indirect means: for instance, by making use of the output of a previous 
project, such as a paper map, and re-digitizing it. 



Figure 2.1: Representing 
relevant aspects of real- 
world phenomena inside 
a GIS to build models or 
simulations. 


In order to better understand both our representation of the phenomena, and 
our eventual output from any analysis, we can use the GIS to create visualizations 
from the computer representation, either on-screen, printed on paper, or other¬ 
wise.^ It is crucial to understand the fundamental differences between these 
notions. The real world, after all, is a completely different domain than the 'GIS' 
world, in which we build models or simulations of the real world. 

Given the complexity of real world phenomena, our models can by definition 
never be perfect. We have limitations on the amount of data that we can store, 

limits on the amount of detail we can capture, and (usually) limits on the time Complexity 

we have available for a project. It is therefore possible that some facts or relation- 

^It should be mentioned here that illustrations in this chapter—by nature—are visualizations 
themselves, although some of them are intended to illustrate a geographic phenomenon or a 
computer representation. The map-like illustrations in this chapter purposely do not have a 
legend or text tags. They are not intended to be maps. 
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ships that exist in the real world may not be discovered through our 'models'. 

Any geographic phenomenon can usually be represented in various ways; the 
choice of which representation is best depends mostly on two issues. Firstly, 
what original, raw data (from sensors or otherwise) is available, and secondly, 
what sort of data manipulation is required or will be undertaken. Key aspects 
of data acquisition and preparation are discussed in Chapter 5. This chapter will 
examine various types of geographic phenomena in more depth, and the types 
of computer representations available for them. 
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2.2 Geographic phenomena 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 


about 





















2 . 2 . Geographic phenomena 


67 


2.2.1 Defining geographic phenomena 

A GIS operates under the assumption that the relevant spatial phenomena occur 
in a two- or three-dimensional Euclidean space, unless otherwise specified. Eu¬ 
clidean space can be informally defined as a model of space in which locations Euclidean space 

are represented by coordinates—(x, y) in 2D; (x, y, z) in 3D—and distance and di¬ 
rection can defined with geometric formulas.In the 2D case, this is known as the 
Euclidean plane, which is the most common Euclidean space in GIS use. 

In order to be able to represent relevant aspects real world phenomena inside 
a GIS, we first need to define what it is we are referring to. We might define 
a geographic phenomenon as a manifestation of an entity or process of interest 
that: 


• Gan be named or described, 

• Gan be georeferenced, and 

• Gan be assigned a time (interval) at which it is/was present. 

The relevant phenomena for a given application depends entirely on one's ob¬ 
jectives. Eor instance, in water management, the objects of study might be river 
basins, agro-ecologic units, measurements of actual evapotranspiration, meteo¬ 
rological data, ground water levels, irrigation levels, water budgets and mea¬ 
surements of total water use. Note that all of these can be named or described. Objectives of the application 
georeferenced and provided with a time interval at which each exists. In mul¬ 
tipurpose cadastral administration, the objects of study are different: houses, 
land parcels, streets of various types, land use forms, sewage canals and other 
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forms of urban infrastructure may all play a role. Again, these can be named or 
described, georeferenced and assigned a time interval of existence. 

Not all relevant phenomena come as triplets {description, georeference, time- 
interval), though many do. If the georeference is missing, we seem to have 
something of interest that is not positioned in space: an example is a legal docu¬ 
ment in a cadastral system. It is obviously somewhere, but its position in space 
is not considered relevant. If the time interval is missing, we might have a phe¬ 
nomenon of interest that is considered to be always there, i.e. the time interval 
is (likely to be considered) infinite. If the description is missing, then we have 
something that exists in space and time, yet carmot be described. Obviously this 
last issue very much limits the usefulness of the information. 

Referring back to the El Nino example discussed in Chapter 1, one could say 
that there are at least three geographic phenomena of interest there. One is the 
Sea Surface Temperature, and another is the Wind Speed in various places. Both 
are phenomena that we would like to understand better. A third geographic 
phenomenon in that application is the array of monitoring buoys. 
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2 . 2.2 Types of geographic phenomena 

The attempted definition of geographic phenomena above is necessarily ab¬ 
stract, and therefore perhaps somewhaf difficult to grasp. The main reason for 
this is that geographic phenomena come in so many different 'flavours', which 
we will try to categorize below. Before doing so, we musf make fwo further 
observations. 

Firstly, In order to be able to represent a phenomenon in a GIS requires us to 
state what it is, and where it is. We must provide a description—or at least a 
name—on the one hand, and a georeference on the other hand. We will skip 
over the temporal issues for now, and come back to these in Section 2.5. The 
reason for this is that current GISs do not provide much automatic support for 
time-dependent data, and that this topic must be therefore be considered an 
issue of advanced GIS use. 

Secondly, some phenomena manifesf fhemselves essentially everywhere in the 
study area, while others only do so in certain localities. If we define our study 
area as the equatorial Pacific Ocean, we can say that Sea Surface Temperature 
can be measured anywhere in the study area. Therefore, it is a typical example 
of a (geographic) yi'eZd. 


A (geographic) field is a geographic phenomenon for which, for every point 
in the study area, a value can be determined. 


Some common examples of geographic fields are air temperature, barometric 
pressure and elevation. These fields are in fact continuous in nature. Examples 
of discrete fields are land use and soil classifications. For these too, any location 
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in the study area is attributed a single land use class or soil class. We discuss 
fields further in Section 2.2.3. 

Many other phenomena do not manifest themselves everywhere in the study 
area, but only in certain localities. The array of buoys of the previous chapter is 
a good example: there is a fixed number of buoys, and for each we know exactly 
where it is located. The buoys are typical examples of (geographic) objects. 


(Geographic) objects populate the study area, and are usually well- 
distinguished, discrete, and bounded entities. The space between them is 
potentially ‘empty’ or undetermined. 


A simple rule-of-thumb is that natural geographic phenomena are usually fields, 
and man-made phenomena are usually objects. Many exceptions to this rule 
actually exist, so one must be careful in applying it. We look at objects in more 
detail in Section 2.2.4. 
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Elevation in the Falset study area, Tarragona province, Spain. The area is approximately 25 x 20 km. The 
illustration has been aesthetically improved by a technique known as ‘hillshading’. In this case, it is as if the 
sun shines from the north-west, giving a shadow effect towards the south-east. Thus, colour alone is not a 
good indicator of elevation; observe that elevation is a continuous function over the space. 



Figure 2.2: A continuous 
field example, namely the 
elevation in the study area 
of Falset, Spain. 

Data source: Department 
of Earth Systems Analysis 
(ESA, ITC) 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 


bibliography 


about 





















2 . 2 . Geographic phenomena 


72 


2.2.3 Geographic fields 

A field is a geographic phenomenon that has a value 'everywhere' in the study 
area. We can therefore think of a field as a mathematical function / that asso¬ 
ciates a specific value with any position in the study area. Hence if (x, y) is a 
position in the study area, then /(x, y) stands for the value of the field / at local¬ 
ity {x,y). 

Fields can be discrete or continuous. In a continuous field, the underlying function 
is assumed to be 'mathematically smooth', meaning that the field values along 
any path through the study area do not change abruptly, but only gradually. 

Good examples of continuous fields are air temperature, barometric pressure, 
soil salinity and elevation. Continuity means that all changes in field values are 
gradual. A continuous field can even be differentiable, meaning we can determine 

a measure of change in the field value per unit of distance anywhere and in any Continuous fields 

direction. For example, if the field is elevation, this measure would be slope, i.e. 
the change of elevation per metre distance; if the field is soil salinity, it would be 
salinity gradient, i.e. the change of salinity per metre distance. Figure 2.2 illus¬ 
trates the variation in elevation in a study area in Spain. A colour scheme has 
been chosen to depict that variation. This is a typical example of a continuous 
field. 

Discrete fields divide the study space in mutually exclusive, bounded parts, with 
all locations in one part having the same field value. Typical examples are land 
classifications, tor instance, using either geological classes, soil type, land use 
type, crop type or natural vegetation type. An example of a discrete field—in 
this case identifying geological units in the Falset study area—is provided in 

Figure 2.3. Observe that locations on the boundary between two parts can be as- Discrete fields 
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signed the field value of the 'left' or 'right' part of that boundary. One may note 
that discrete fields are a step from continuous fields towards geographic objects: 
discrete fields as well as objects make use of 'bounded' features. Observe, how¬ 
ever, that a discrete field still assigns a value to every location in the study area, 
something that is not typical of geographic objects. 

Essentially, these two types of fields differ in the type of cell values. A discrete 
field like landuse type will store cell values of the type 'integer'. Therefore if is 
also called an integer raster. Discrete fields can be easily converfed to polygons, 
since it is relatively easy to draw a boundary line around a group of cells with 

the same value. A continuous raster is also called a 'floating point' raster. A Field-based model 

field-based model consists of a finite collection of geographic fields: we may be in¬ 
terested in elevation, barometric pressure, mean aimual rainfall, and maximum 
daily evapotranspiration, and thus use four different fields to model the relevant 
phenomena within our study area. 
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□ Miocene and Quaternary (lower left) 

□ Oligocene (left) 

■ Cretaceous (right) 

H Eocene 

□ Lias 

n Keuper and Muschelkalk 

■ Bundsandstein 

□ Intrusive and sedimentary areas 


Observe that—typical for fields—with any loca¬ 
tion only a single geological unit is associated. 
As this is a discrete field, value changes are 
discontinuous, and therefore locations on the 
boundary between two units are not associated 
with a particular value (i.e. with a geological 
unit). 



Figure 2.3: A discrete 
field indicating geological 
units, used in a foundation 
engineering study for con¬ 
structing buildings. The 
same study area as in Fig¬ 
ure 2.2. 

Data source: Department 
of Earth Systems Analysis 
(ESA, ITC) 
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Data types and values 

Since we have now differentiated between continuous and discrete fields, we 
may also look at different kinds of data values which we can use to represent 
our 'phenomena'. It is important to note that some of these data types limit the 
types of analyses that we can do on the data itself: 

1. Nominal data values are values that provide a name or identifier so that 
we can discriminate between different values, but that is about all we can 
do. Specifically, we caimot do true computations with these values. An 
example are the names of geological units. This kind of data value is called 
categorical data when the values assigned are sorted according to some set 
of non-overlapping categories. For example, we might identify the soil 
type of a given area to belong to a certain (pre-defined) category. 

2. Ordinal data values are data values that can be put in some natural sequence 
but that do not allow any other type of computation. Household income, 
for instance, could be classified as being either 'low', 'average' or 'high'. 
Clearly this is their natural sequence, but this is all we can say—we can not 
say that a high income is twice as high as an average income. 

3. Interval data values are quantitative, in that they allow simple forms of com¬ 
putation like addition and subtraction. However, interval data has no 
arithmetic zero value, and does not support multiplication or division. For 
instance, a temperature of 20 °C is not twice as warm as 10 °C, and thus 
centigrade temperatures are interval data values, not ratio data values. 

4. Ratio data values allow most, if not all, forms of arithmetic computation. 
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Rational data have a natural zero value, and multiplication and division 
of values are possible operators (distances measured in metres are an ex¬ 
ample). Continuous fields can be expecfed fo have ratio dafa values, and 
hence we can inferpolafe fhem. 


We usually refer to nominal and categorical data values as 'qualitative' data, be¬ 
cause we are limited in terms of the computations we can do on this type of data. 

Interval and ratio data is known as 'quantitative' data, as it refers fo quantifies. 

However, ordinal data does not seem to fit either of these data types. Often, Qualitative and quantitative 
ordinal data refers to a ranking scheme or some kind of hierarchical phenom- data 

ena. Road nefworks, for example, are made up of motorways, main roads, and 
residential streets. We might expect roads classified as moforways fo have more 
lanes and carry more fraffic and fhan a residential street. 
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2.2.4 Geographic objects 

When a geographic phenomenon is not present everywhere in the study area, 
but somehow 'sparsely' populates it, we look at it as a collection of geographic 
objects. Such objects are usually easily distinguished and named, and their po¬ 
sition in space is determined by a combination of one or more of the following 
parameters: 

• Location (where is it?), 

• Shape (what form is it?), 

• Size (how big is it?), and 

• Orientation (in which direction is it facing?). 

How we want to use the information about a geographic object determines 
which of the four above parameters is required to represent it. For instance, in 
an in-car navigation system, all that matters about geographic objects like petrol 
stations is where they are. Thus, location alone is enough to describe them in this 
particular context, and shape, size and orientation are not necessarily relevant. 
In the same system, however, roads are important objects, and for these some 
notion of location (where does it begin and end), shape (how many lanes does it 
have), size (how far can one travel on it) and orientation (in which direction can 
one travel on it) seem to be relevant information components. 

Shape is usually important because one of its factors is dimension. This relates to 
whether an object is perceived as a point feature, or a linear, area or volume fea¬ 
ture. The petrol stations mentioned above apparently are zero-dimensional, i.e. 
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they are perceived as points in space; roads are one-dimensional, as they are con- Dimensionality of features 

sidered to be lines in space. In another use of road information—for instance, in 

multi-purpose cadastre systems where precise location of sewers and manhole 

covers matters—roads might well be considered to be two-dimensional entities, 

i.e. areas within which a manhole cover may fall. 

Figure 2.4 illustrates geological faults in the Falset study area, a typical example 
of a geographic phenomenon that is made up of objects. Each of the faults has 
a location, and here the fault's shape is represented as a one-dimensional object. 

The size, which is length in case of one-dimensional objects, is also indicated. 

Orientation does not play a role in this case. 

We usually do not study geographic objects in isolation, but more often we look 
at collections of objects viewed as a unit. These object collections may also have 
specific geographic characteristics. Most of the more interesting collections of 
geographic objects obey certain natural laws. The most common (and obvious) 
of these is that different objects do not occupy the same location. This, for in¬ 
stance, holds for the collection of petrol stations in an in-car navigation system, 
the collection of roads in that system, the collection of land parcels in a cadastral 
system, and in many more cases. We will see in Section 2.3 that this natural law 
of 'mutual non-overlap' has been a guiding principle in the design of computer 
representations of geographic phenomena. 

Collections of geographic objects can be interesting phenomena at a higher ag¬ 
gregation level: forest plots form forests, groups of parcels form suburbs, streams, 
brooks and rivers form a river drainage system, roads form a road network, and 

SST buoys form an SST sensor network. It is sometimes useful to view geo- Geographic scale 

graphic phenomena at this more aggregated level and look at characteristics like 
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coverage, connectedness, and capacity. For example: 


• Which part of the road network is within 5 km of a pefrol sfation? (A 
coverage quesfion) 

• What is the shortest route between two cities via the road network? (A 
coimectedness question) 



Figure 2.4: A number of 
geological faults in the 
same study area as in Fig¬ 
ure 2.2. Faults are indi¬ 
cated in blue; the study 
area, with the main geo¬ 
logical era’s is set in grey 
in the background only as 
a reference. 

Data source: Department 
of Earth Systems Analysis 
(ITC) 
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• How many cars can optimally travel from one city to another in an hour? 
(A capacity question) 

Other spatial relationships between the members of a geographic object collec¬ 
tion may exist and can be relevant in GIS usage. Many of them fall in the cate¬ 
gory of topological relationships, discussed in Section 2.3.4. 
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2.2.5 Boundaries 

Where shape and/or size of contiguous areas matter, the notion of boundary 
comes into play. This is true tor geographic objects but also for the constituents 
of a discrete geographic field, as will be clear from another look at Figure 2.3. 

Location, shape and size are fully determined if we know an area's boundary, 
so the boundary is a good candidate for representing it. This is especially true 

for areas that have naturally crisp boundaries. A crisp boundary is one that can Crisp and fuzzy boundaries 
be determined with almost arbitrary precision, dependent only on the data ac¬ 
quisition technique applied. Fuzzy boundaries contrast with crisp boundaries in 
that the boundary is not a precise line, but rather itself an area of transition. 

As a general rule-of-thumb, crisp boundaries are more common in man-made 
phenomena, whereas fuzzy boundaries are more common with natural phe¬ 
nomena. In recent years, various research efforts have addressed the issue of 
explicit treatment of fuzzy boundaries, but there is still limited support for these 
in existing GIS software. The areas identified in a geological classification, like 
that of Figure 2.3, are typically vaguely bounded in reality, but applications of 
this geological information probably do not require high positional accuracy of 
the boundaries involved. Therefore, an assumption that they are actually crisp 
boundaries will have little influence on the usefulness of the data. 
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2.3 Computer representations of geographic informa¬ 
tion 


Up to this point, we have not looked at how geoinformation, like fields and ob¬ 
jects, is represented in a computer. After the discussion of the main characteris¬ 
tics of geographic phenomena above, let us now examine representation in more 
detail. We have seen that various geographic phenomena have the characteris¬ 
tics of confinuous funcfions over space. Elevafion, for insfance, can be measured 
at many locations, even within one's own backyard, and each location may give 
a different value. In order to represent such a phenomenon faithfully in com¬ 
puter memory, we could either: 

• Try to store as many {location, elevation) observation pairs as possible, or 

• Try to find a symbolic representation of fhe elevation field function, as a 
formula in x and y —like (3.0678x^ -|- 20.08x — 7.3%) or so—which can be 
evaluated to give us the elevation at any given {x, y) location. 

Both of these approaches have their drawbacks. The first suffers from the fact 
that we will never be able to store all elevation values for all locations; affer 
all, fhere are infinitely many locations. The second approach suffers from the 
fact that we do not know just what this function should look like, and that it 
would be extremely difficult to derive such a function for larger areas. In GISs, 
typically a combination of both approaches is taken. We store a finite, but intel¬ 
ligently chosen set of (sample) locations with their elevation. This gives us the Interpolating sample values 
elevation for those stored locations, but not for others. We can use an interpo¬ 
lation function that allows us to infer a reasonable elevation value for locations 
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that are not stored. A simple and commonly used interpolation function takes 
the elevation value of the nearest location that is stored. But smarter interpola¬ 
tion functions (involving more than a single stored value), can be used as well, 
as may be understood from the SST interpolations of Figure 1.1. Interpolation of 
point data discussed in more detail in Section 5.4. 

Interpolation is made possible by a principle called spatial autocorrelation. This 
is a fundamental principle which refers to the fact that locations that are closer 
together are more likely to have similar values than locations that are far apart— 

commonly referred to as Tobler's first law of Geography'. An obvious example Spatial autocorrelation 

of a phenomenon which exhibits this property is sea-surface temperature, where 
one might expect a high degree of correlation between measures taken close 
together (refer to the SST example of Chapter 1). 

Line objects, either by themselves or in their role of region object boundaries, are 
another common example of continuous phenomena that must be finitely repre¬ 
sented. In real life, these objects are usually not straight, and are often erratically 

curved. A famous paradoxical question is whether one can actually measure the Boundaries 

length of Great Britain's coastline, i.e. can one measure around rocks, pebbles 
or even grains of sand?^ In a computer, such random, curvilinear features can 
never be fully represented, and usually require some degree of generalization. 

From this it becomes clear that phenomena with intrinsic continuous and/or in¬ 
finite characteristics have to be represented with finite means (computer mem¬ 
ory) for computer manipulation, and any finite representation scheme is open 
to errors of interpretation. To this end, fields are usually implemented with a 

^Making the assumption that we can decide where precisely the coastline is ... it may not be 
as crisp as we think. 
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tessellation approach, and objects with a (topological) vector approach, however, 
this is not a hard-and-fast rule, as practice sometimes demands otherwise. 

In the following sections we discuss tessellations, vector-based representations 
and how these are applied to represent geographic fields and objects. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















2.3. Computer representations of geographic information 


85 


2.3.1 Regular tessellations 

A tessellation (or tiling) is a partitioning of space into mutually exclusive cells 
that together make up the complete study space. With each cell, some (thematic) 
value is associated to characterize that part of space. Three regular tessellation 
types are illustrated in Figure 2.5. In a regular tessellation, the cells are the same 
shape and size. The simplest example is a rectangular raster of unit squares, 
represented in a computer in the 2D case as an array of n x m elements (see 
Figure 2.5-left). 



Figure 2.5: The three 
most common regular tes¬ 
sellation types: square 
cells, hexagonal cells, and 
triangular cells. 


In all regular tessellations, the cells are of the same shape and size, and the 
field attribute value assigned to a cell is associated with the entire area occu¬ 
pied by the cell. The square cell tessellation is by far the most commonly used, 
mainly because georeferencing a cell is so straightforward. These tessellations 
are known under various names in different GIS packages, but most frequently 
as rasters. 
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A raster is a set of regularly spaced (and contiguous) cells with associated 
(field) values. The associated values represent cell values, not point values. 
This means that the value for a cell is assumed to be valid for all locations 
within the cell. 


The size of the area that a single raster cell represents is called the raster's resolu¬ 
tion. Sometimes, the word grid is also used, but strictly speaking, a grid refers to 

values at the intersections of a network of regularly spaced horizontal and per- Grids and rasters 

pendicular lines (see Figure 2.6). Grids are often used for discrete measurements 
that occur at regular intervals. Grid values are often considered synonymous 
with raster cells, although they are not. 



(b) 


Figure 2.6: A grid (a) 
is a collection of regu¬ 
larly spaced (field) values, 
while a raster (b) is com¬ 
posed of cells. The associ¬ 
ated values with each grid 
point or raster cell are not 
illustrated. 


There are some issues related to cell-based partitioning of the study space. The 
field value of a cell can be interpreted as one for the complete tessellation cell, in 
which case the field is discrete, not continuous or even differentiable. Some con¬ 
vention is needed to state which value prevails on cell boundaries; with square 
cells, this convention often says that lower and left boundaries belong to the cell. 
To improve on this continuity issue, we can do two things: 
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• Make the cell size smaller, so as to make the 'continuity gaps' between the 
cells smaller, and/or 

• Assume that a cell value only represents elevation for one specific loca¬ 
tion in the cell, and to provide a good interpolation function for all other 
locations that has the continuity characteristic. 


Usually, if one wants to use rasters for continuous field representation, one does 
the first but not the second. The second technique is usually considered too 
computationally intensive for large rasters. 

The location associated with a raster cell is fixed by convention, and may be 
the cell centroid (mid-point) or, for instance, its left lower corner. Values for 
other positions than these must be computed through some form of interpola¬ 
tion function, which will use one or more nearby field values to compute the 
value at the requested position. This allows us to represent continuous, even 
differentiable, functions. 

An important advantage of regular tessellations is that we know how they parti¬ 
tion space, and we can make our computations specific to this partitioning. This 
leads to fast algorithms. An obvious disadvantage is that they are not adaptive 
to the spatial phenomenon we want to represent. The cell boundaries are both 
artificial and fixed: they may or may not coincide with the boundaries of the 
phenomena of interest. For example, suppose we use any of the above regular 
tessellations to represent elevation in a perfectly flat area. In this case we need 
just as many cells as in a strongly undulating terrain: the data structure does not 
adapt to the lack of relief. We would, for instance, still use the mx n cells for the 
raster, although the elevation might be 1500 m above sea level everywhere. 
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2.3.2 Irregular tessellations 

Above, we discussed that regular tessellations provide simple structures with 
straightforward algorithms, which are, however, not adaptive to the phenomena 
they represent. Essentially this means they might not represent the phenomena 
in the most efficient way For this reason, substantial research effort has also been 

put into irregular tessellations. Again, these are partitions of space info mufually Irregular tesselations are 
disjoinf cells, buf now fhe cells may vary in size and shape, allowing fhem fo adaptive 

adapf fo fhe spatial phenomena fhaf fhey represenf. We discuss here only one 
fype, namely fhe region quadtree, buf we poinf ouf fhaf many more sfrucfures 
have been proposed in the literature, and have also been implemented. 

Irregular tessellations are more complex than the regular ones, but they are also 
more adaptive, which typically leads to a reduction in the amount of memory 
used to store the data. A well-known data structure in this family—upon which 
many more variations have been based—is fhe region quadtree. It is based on a 
regular tessellation of square cells, buf fakes advanfage of cases where neigh¬ 
bouring cells have fhe same field value, so fhaf fhey can fogefher be represenfed 

as one bigger cell. A simple illusfrafion is provided in Figure 2.7. If shows a Quadtrees 

small 8x8 rasfer wifh fhree possible field values: while, green and blue. The 
quadfree fhaf represenfs fhis rasfer is construcfed by repeafedly splitting up fhe 
area info four quadranfs, which are called NW, NE, SE, SW for obvious rea¬ 
sons. This procedure slops when all fhe cells in a quadranf have the same field 
value. The procedure produces an upside-down, tree-like structure, known as a 
quadtree. In main memory, the nodes of a quadfree (bofh circles and squares in 
the figure below) are represenfed as records. The links befween fhem are point¬ 
ers, a programming technique to address (i.e. to point to) other records. 
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Figure 2.7: An 8 x 8, 

three-valued raster (here: 
colours) and its repre¬ 
sentation as a region 
quadtree. To construct 
the quadtree, the field 
is successively split into 
four quadrants until parts 
have only a single field 
value. After the first split, 
the southeast quadrant is 
entirely green, and this 
is indicated by a green 
square at level two of the 
tree. Other quadrants had 
to be split further. 


Quadtrees are adaptive because they apply the spatial autocorrelation principle, 
i.e. that locations that are near in space are likely to have similar field values. 
When a conglomerate of cells has the same value, they are represented together 
in the quadtree, provided boundaries coincide with the predefined quadranf 
boundaries. This is why we can also state that a quadtree provides a nested 
tessellation: quadrants are only split if they have two or more values. The square 
nodes at the same level represent equal area sizes, allowing quick computation 
of the area associated with some field value. The top node of the tree represents 
the complete raster. 

To summarise the above discussion, we can say that tessellations partition the 
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study space into cells, and assign a value to each cell. A raster is a regular tessel¬ 
lation with square cells (by far the most commonly used). The method by which 
the study space is split into cells is (to some degree) arbitrary, as cell boundaries 
usually have little or no bearing to the real world phenomena that are repre¬ 
sented. 
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2.3.3 Vector representations 

Tessellations do not explicitly store georeferences of the phenomena they repre¬ 
sent. Instead, they provide a georeference of the lower left comer of the raster, 
for instance, plus an indicator of the raster's resolution, thereby implicitly pro¬ 
viding georeferences for all cells in the raster. In vector representations, an attempt 

is made to explicitly associate georeferences with the geographic phenomena. A Vectors store georeferences 
georeference is a coordinate pair from some geographic space, and is also known explicitly 

as a vector. This explains the name. Below, we discuss various vector representa¬ 
tions. We start with our discussion with the TIN, a representation for geographic 
fields that can be considered a hybrid between tessellations and vector represen¬ 
tations. 
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Figure 2.8: Input locations 
and their (elevation) val¬ 
ues for a TIN construction. 
The location P is an arbi¬ 
trary location that has no 
associated elevation mea¬ 
surement. 


Triangulated Irregular Networks 

A commonly used data structure in GIS software is the triangulated irregular net¬ 
work, or TIN. It is one of the standard implementation techniques for digital 
terrain models, but it can be used to represent any continuous field. The prin¬ 
ciples behind a TIN are simple. It is built from a set of locations for which we 

have a measurement, for instance an elevation. The locations can be arbitrar- TINs represent a continuous 
ily scattered in space, and are usually not on a nice regular grid. Any location fislcl 

together with its elevation value can be viewed as a point in three-dimensional 
space. This is illustrated in Figure 2.8. From these 3D points, we can construct 
an irregular tessellation made of triangles. Two such tessellations are illustrated 
in Figure 2.9. 

In three-dimensional space, three points uniquely determine a plane, as long as 
they are not collinear, i.e. they must not be positioned on the same line. A plane 
fitted through these points has a fixed aspect and gradient, and can be used 
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Figure 2.9: Two trian¬ 
gulations based on the 
input locations of Fig¬ 
ure 2.8. (a) one with 

many ‘stretched’ trian¬ 
gles; (b) the triangles are 
more equilateral; this is a 
Delaunay triangulation. 

to compute an approximation of elevation of other locations.^ Since we can pick 
many triples of points, we can construct many such planes, and therefore we can 
have many elevation approximations for a single location, such as P (Figure 2.8). 

So, it is wise to restrict the use of a plane to the triangular area 'between' the three 
points. 

If we resfrict the use of a plane to the area between its three anchor points, we ob¬ 
tain a triangular tessellation of the complete study space. Unfortunately, there are 
many different tessellations for a given inpuf sef of anchor points, as Figure 2.9 
demonstrates with two of them. Some tessellations are better than others, in the 
sense that they make smaller errors of elevation approximation. For instance, if 
we base our elevation computation for location P on the left hand shaded tri¬ 
angle, we will get another value than from the right hand shaded triangle. The 
second will provide a better approximation because the average distance from 

^Slope is usually defined to consist of two parts: the gradient and the aspect. The gradient is a 
steepness measure indicating the maximum rate of elevation change, indicated as a percentage 
or angle. The aspect is an indication of which way the slope is facing; it can be defined as the 
compass direction of the gradient. More can be found in Section 6.4.4. 
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P to the three triangle anchors is smaller. 

The triangulation of Figure 2.9(b) happens to be a Delaunay triangulation, which 
in a sense is an optimal triangulation. There are multiple ways of defining what 
such a triangulation is (see [46]), but we suffice here to state two important - 

properties. The first is that the triangles are as equilateral ('equal-sided') as they Delaunay triangulation 
can be, given the set of anchor points. The second property is that for each trian¬ 
gle, the circumcircle through its three anchor points does not contain any other 
anchor point. One such circumcircle is depicted on the right of Figure 2.9(b). 

A TIN clearly is a vector representation: each anchor point has a stored georef¬ 
erence. Yet, we might also call it an irregular tessellation, as the chosen triangu¬ 
lation provides a partitioning of the entire study space. However, in this case, 
the cells do not have an associated stored value as is typical of tessellations, but 
rather a simple interpolation function that uses the elevation values of its three 
anchor points. 
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Point representations 

Points are defined as single coordinate pairs (x, y) when we work in 2D, or co¬ 
ordinate triplets (x, y, z) when we work in 3D. The choice of coordinafe sysfem 
is another matter, which we will discuss in Chapter 4. 

Points are used to represent objects that are best described as shape- and size¬ 
less, one-dimensional features. Whether this is the case really depends on the 
purposes of the spatial application and also on the spatial extent of the objects 
compared to the scale applied in the application. For a tourist city map, a park 
will not usually be considered a point feature, but perhaps a museum will, and 
certainly a public phone booth might be represented as a point. 

Besides the georeference, usually exfra data is stored for each poinf objecf. This 
so-called attribute or thematic data, can capfure anything that is considered rel¬ 
evant about the object. For phone booth objects, this may include the owning 
telephone company, the phone number, or the data last serviced. 
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Line representations 

Line data are used to represent one-dimensional objects such as roads, railroads, 
canals, rivers and power lines. Again, there is an issue of relevance for the appli¬ 
cation and the scale that the application requires. For the example application of 
mapping tourist information, bus, subway and streetcar routes are likely to be 
relevant line features. Some cadastral systems, on the other hand, may consider 
roads to be two-dimensional features, i.e. having a width as well. 

Above, we discussed the notion that arbitrary, continuous curvilinear features 
are as equally difficult to represent as continuous fields. GISs therefore approxi¬ 
mate such features (finitely!) as lists of nodes. The two end nodes and zero or more 

internal nodes or vertices define a line. Other terms for Tine' that are commonly Nodes and vertices 

used in some GISs are polyline, arc or edge. A node or vertex is like a point (as 
discussed above) but it only serves to define the line, and provide shape in order 
to obtain a better approximation of the actual feature. 

The straight parts of a line between two consecutive vertices or end nodes are 
called line segments. Many GISs store a line as a simple sequence of coordinates 
of its end nodes and vertices, assuming that all its segments are straight. This is 
usually good enough, as cases in which a single straight line segment is con¬ 
sidered an unsatisfactory representation can be dealt with by using multiple 
(smaller) line segments instead of only one. 

Still, there are cases in which we would like to have the opportunity to use arbi¬ 
trary curvilinear features as representation of real-world phenomena, but many 

systems do not at present accommodate such shapes. If a GIS supports some of Representing curved lines 
these curvilinear features, it does so using parameterized mathematical descrip- 
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Figure 2.10: A line is de¬ 
fined by its two end nodes 
and zero or more internal 
nodes, also known as ver¬ 
tices. This line represen¬ 
tation has three vertices, 
and therefore four line seg¬ 
ments. 


tions. A discussion of these more advanced techniques is beyond the purpose of 
this text book. 

Collections of (connected) lines may represent phenomena that are best viewed as 
networks. With networks, specific types of interesting questions arise that have 

to do with cormectivity and network capacity. These relate to applications such Networks 

as traffic monitoring and watershed management. With network elements—i.e. 
the lines that make up the network—extra values are commonly associated like 
distance, quality of the link, or carrying capacity. 
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Area representations 

When area objects are stored using a vector approach, the usual technique is 
to apply a boundary model. This means that each area feature is represented 
by some arc/node structure that determines a polygon as the area's bound¬ 
ary Common sense dictates that area features of the same kind are best stored Polygons 

in a single data layer, represented by mutually non-overlapping polygons. In 
essence, what we then get is an application-determined (i.e. adaptive) partition 
of space. 

Observe that a polygon representation for an area object is yet another example 
of a finite approximation of a phenomenon that inherently may have a curvi¬ 
linear boundary. In the case that the object can be perceived as having a fuzzy 
boundary, a polygon is an even worse approximation, though potentially the 
only one possible. An example is provided in Figure 2.11. It illustrates a simple 
study with three area objects, represented by polygon boundaries. Clearly, we 
expect additional data to accompany the area data. Such information could be 
stored in database tables. 


*3 



Figure 2.11; Areas as 
they are represented by 
their boundaries. Each 
boundary is a cyclic se¬ 
quence of line features; 
each line—as before—is 
a sequence of two end 
nodes, with in between, 
zero or more vertices. 
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A simple but naive representation of area features would be to list for each poly¬ 
gon simply the list of lines that describes its boundary Each line in the list 
would, as before, be a sequence that starts with a node and ends with one, pos¬ 
sibly with vertices in between. But this is far from optimal. To understand why Data redundancy 

this is the case, take a closer look at the shared boundary between the bottom 
left and right polygons in Figure 2.11. The line that makes up the boundary be¬ 
tween them is the same, which means that using the above representation the 
line would be stored twice, namely once for each polygon. This is a form of data 
duplication—known as data redundancy —^which is (at least in theory,) uimeces- 
sary, although it remains a feature of some sysfems. 

There is another disadvantage to such polygon-by-polygon representations. If we 
want to find out which polygons border the bottom left polygon, we have to do 
a rather complicated and time-consuming analysis comparing the vertex lists of 
all boundary lines with that of the bottom left polygon. In the case of Figure 2.11, 
with just three polygons, this is fine, but when our data set has 5,000 polygons, 
with perhaps a total of 25,000 boundary lines, even the fastest computers will 
take their time in finding neighbouring polygons. 

The boundary model is an improved representation that deals with these disad¬ 
vantages. It stores parts of a polygon's boundary as non-looping arcs and in¬ 
dicates which polygon is on the left and which is on the right of each arc. A Boundary model 

simple example of the boundary model is provided in Figure 2.12. It illustrates 
which additional information is stored about spatial relationships between lines 
and polygons. Obviously, real coordinates for nodes (and vertices) will also be 
stored in another table. 

The boundary model is sometimes also called the topological data model as it cap- 
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Figure 2.12: A simple 
boundary model for the 
polygons A, B and C. For 
each arc, we store the 
start and end node (as 
well as a vertex list, but 
these have been omitted 
from the table), its left and 
right polygon. The ‘poly¬ 
gon’ W denotes the out¬ 
side world polygon. 


tures some topological information, such as polygon neighbourhood. Observe 
that it is a simple query to find all the polygons that are the neighbour of some 
given polygon, unlike the case above. 
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2.3.4 Topology and spatial relationships 
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General spatial topology 


Topology deals with spatial properties that do not change under certain transfor¬ 
mations. For example, features drawn on a sheet of rubber (as in Figure 2.13) 
can be made to change in shape and size by stretching and pulling the sheet. 
However, some properties of these features do not change: 


• Area E is still inside area D, 

• The neighbourhood relationships between A, B, C, D, and E stay intact, 
and their boundaries have the same start and end nodes, and 

• The areas are still bounded by the same boundaries, only the shapes and 
lengths of their perimeters have changed. 



Figure 2.13: Rubber 

sheet transformation: The 
space is transformed, 
yet many relationships 
between the constituents 
remain unchanged. 


Topology refers to the spatial relationships between geographical elements 
in a data set that do not change under a continuous transformation. 
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Topological relationships are built from simple elements into more complex el¬ 
ements: nodes define line segments, and line segments coimect to define lines, 
which in turn define polygons. The fundamental issues relating to order, con¬ 
nectivity and adjacency of geographical elements form the basis of more sophis- Topological properties 

ticated GIS analyses. These relationships (called topological properties) are in¬ 
variant under a continuous transformation, referred to as a topological mapping. 


In what follows below, we will look at aspects of topology in two ways. Firstly, 
using simplices, we will look at how simple elements (points) can be combined 
to define more complex ones (lines and polygons). Secondly, we will exam¬ 
ine the logical aspects of topological relationships using set-theory. The three- 
dimensional case is also briefly discussed. 
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Topological relationships 

The mathematical properties of the geometric space used for spatial data can be 
described as follows: 

• The space is a three-dimensional Euclidean space where for every point we 
can determine its three-dimensional coordinates as a triple {x, y, z) of real 
numbers. In this space, we can define features like points, lines, polygons, 
and volumes as geometric primitives of the respective dimension. A point 
is zero-dimensional, a line one-dimensional, a polygon two-dimensional, 
and a volume is a three-dimensional primitive. 

• The space is a metric space, which means that we can always compute the 
distance between two points according to a given distance function. Such 
a function is also known as a metric. 

• The space is a topological space, of which the definition is a bit compli¬ 
cated. In essence, tor every point in the space we can find a neighbourhood 
around it that fully belongs to that space as well. 

• Interior and boundary are properties of spatial features that remain invari¬ 
ant under topological mappings. This means, that under any topological 
mapping, the interior and the boundary of a feature remains unbroken and 
intact. 

There are a number of advantages when our computer representations of ge¬ 
ographic phenomena have built-in sensitivity of topological issues. Questions 
related to the 'neighbourhood' of an area are a point in case. To obtain some 
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O-simplex 


o - o 1-simplex 




simplicial complex 


Figure 2.14: Simplices 

and a simplicial complex. 
Features are approxi¬ 
mated by a set of points, 
line segments, triangles, 
and tetrahedrons. 
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'topological sensitivity' simple building blocks have been proposed with which 
more complicated representations can be constructed: 

• We can define within the topological space, features that are easy to handle 
and that can be used as representations of geographic objects. These fea¬ 
tures are called simplices as they are the simplest geometric shapes of some 
dimension: point (0-simplex), line segment (1-simplex), triangle (2-simplex), 
and tetrahedron (3-simplex). 

• When we combine various simplices into a single feature, we obtain a sim- 
plicial complex. Figure 2.14 provides examples. 

As the topological characteristics of simplices are well-known, we can infer the 
topological characteristics of a simplicial complex from the way it was con¬ 
structed. 
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The topology of two dimensions 

We can use the topological properties of interior and boundary to define rela¬ 
tionships between spatial features. Since the properties of interior and bound¬ 
ary do not change under topological mappings, we can investigate their possi¬ 
ble relations between spatial features."^ We can define the interior of a region R 

as the largest set of points of R for which we can construct a disk-like environ- Interior and exterior 

ment around it (no matter how small) that also falls completely inside R. The 

boundary of R is the set of those points belonging to R but that do not belong to 

the interior of R, i.e. one caimot construct a disk-like environment around such 

points that still belongs to R completely. 

Suppose we consider a spatial region A. It has a boundary and an interior, 
both seen as (infinite) sets of points, and which are denoted by boundary (A) and 
interior (A), respectively. We consider all possible combinations of intersections 
(n) between the boundary and the interior of A with those of another region 

B, and test whether they are the empty set (0) or not. From these intersection Set theory 

patterns, we can derive eight (mutually exclusive) spatial relationships between 

two regions. If, for instance, the interiors of A and B do not intersect, but their 

boundaries do, yet a boundary of one does not intersect the interior of the other, 

we say that A and B meet. In mathematics, we can therefore define the meets 

relationship using set theory, as 


■^We restrict ourselves here to relationships between spatial regions (i.e. two-dimensional fea¬ 
tures without holes). 
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A meets B = interior{A) fl interior{B) = 0 A 

boundary (A) fl boundary {B) ^ 0 A 
interior [A) fl boundary (B) = 0 A 
boundary{A) fl interior{B) = 0. 


In the above formula, the symbol A expresses the logical coimective 'and'. Thus, 
the formula states four properties that must all hold for the formula to be true. 




... is equal to ... 



... is inside ... 




... covers ... 


... overlaps ... 


Figure 2.15: Spatial re¬ 
lationships between two 
regions derived from the 
topological invariants of in¬ 
tersections of boundary 
and interior. The relation¬ 
ships can be read with the 
green region on the left... 
and the blue region on the 
right... 
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Figure 2.15 shows all eight spatial relationships: disjoint, meets, equals, inside, cov¬ 
ered by, contains, covers, and overlaps. These relationships can be used in queries 
against a spatial database, and represent the 'building blocks' of more complex 
spatial queries. 

It turns out that the rules of how simplices and simplicial complexes can be 
emdedded in space are quite different for two-dimensional space than they are 
for three-dimensional space. Such a set of rules defines the topological consistency 

of that space. It can be proven that if the rules below are satisfied for all fea- Topological consistency 
tures in a two-dimensional space, the features define a topologically consistent 
configuration in 2D space. The rules are illustrated in Figure 2.16. 
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1. Every 1-simplex (‘arc’) must be bounded by two 0-simplices (‘nodes’, 
namely its begin and end node) 

2. Every 1-simplex borders two 2-simplices (‘polygons’, namely its ‘left’ 
and ‘right’ polygons) 

3. Every 2-simplex has a closed boundary consisting of an alternating 
(and cyclic) sequence of 0- and 1-simplices. 

4. Around every 0-simplex exists an alternating (and cyclic) sequence of 
1- and 2-simplices. 

5. 1-simplices only intersect at their (bounding) nodes. 



rule (1) 



rules (2, 5) 



Figure 2.16: The five 
rules of topological consis¬ 
tency in two-dimensional 
space 
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The three-dimensional case 

It is not without reason that our discussion of vector representations and spatial 
topology has focused mostly on objects in two-dimensional space. The history 
of spatial data handling is almost purely 2D, and this is remains the case tor 
the majority of present-day GIS applications. Many application domains make 
use of elevational, but these are usually accommodated by so-called 2^D data 
structures. These 2|D data structures are similar to the (above discussed) 2D 
data structures using points, lines and areas. They also apply the rules of two- 
dimensional topology, as they were illustrated in Figure 2.16. This means that 
different lines cannot cross without intersecting nodes, and that different areas 
caimot overlap. 

There is, on the other hand, one important aspect in which 2^D data does dif¬ 
fer from standard 2D data, and that is in their association of an additional z- 
value with each 0-simplex ('node'). Thus, nodes also have an elevation value 
associated with them. Essentially, this allows the GIS user to represent 1- and 
2-simplices that are non-horizontal, and therefore, a piecewise planar, 'wrinkled 
surface' can be constructed as well, much like a TIN. Note however, that one 
caimot have two different nodes with identical x- and ^-coordinates, but differ¬ 
ent 2 ;-values. Such nodes would constitute a perfectly vertical feature, and this 
is not allowed. Gonsequently, true solids cannot be represented in a 2^D GIS. 

Solid representation is an important feature for some dedicated GIS application 
domains. Two of these are worth mentioning here: mineral exploration, where 
solids are used to represent ore bodies, and urban models, where solids may rep¬ 
resent various human constructions like buildings and sewer canals. The three- 
dimensional characteristics of such objects are fundamental as their depth and 
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volume may matter, or their real life visibility must be faithfully represented. 

A solid can be defined as a true 3D object. An important class of solids in 3D GIS 
is formed by the polyhedra, which are the solids limited by planar facets. A facet 
is polygon-shaped, flat side that is part of the boundary of a polyhedron. Any 
polyhedron has at least four facets; this happens to be the case for the 3-simplex. 
Most polyhedra have many more facets; the cube already has six. 
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2.3.5 Scale and resolution 

In the practice of spatial data handling, one often comes across questions like 
"what is the resolution of the data?" or "at what scale is your data set?" Now 
that we have moved firmly into the digital age, these questions sometimes defy 
an easy answer. 

Map scale can be defined as the ratio between the distance on a paper map and 
the distance of the same stretch in the terrain. A 1:50,000 scale map means that 

1 cm on the map represents 50,000 cm, i.e. 500 m, in the terrain. 'Large-scale' Large-scale and small-scale 
means that the ratio is large, so typically it means there is much detail, as in maps 

a 1:1,000 paper map. 'Small-scale' in contrast means a small ratio, hence less 
detail, as in a 1:2,500,000 paper map. When applied to spatial data, the term 
resolution is commonly associated with the cell width of the tessellation applied. 

Digital spatial data, as stored in a GIS, is essentially without scale: scale is a 
ratio notion associated with visual output, like a map or on-screen display, not 
with the data that was used to produce the map. We will later see that digital 
spatial data can be obtained by digitizing a paper map (Section 5.1.2), and in this 
context we might informally say that the data is at this-or-that scale, indicating 
the scale of the map from which the data was derived. 

When digital spatial data sets have been collected with a specific map-making 
purpose in mind, and these maps were designed to be of a single map scale, like 
1:25,000, we might suppose that the data carries the characteristics of "a 1:25,000 
digital data set." 
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2.3.6 Representations of geographic fields 

In the above, we have looked at various representation techniques. Now we can 
study which of them can be used to represent a geographic field. 

A geographic field can be represented through a tessellation, through a TIN or 
through a vector representation. The choice between them is determined by the 
requirements of the application at hand. It is more common to use tessellations, 
notably rasters, for field represenfafion, buf vecfor represenfations are in use foo. 
We have already looked at TINs. We provide an example of fhe other two below. 
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Raster representation of a field 

In Figure 2.17, we illustrate how a raster represents a continuous field like ele¬ 
vation. Different shades of blue indicate different elevation values, with darker 
blues indicating higher elevations. The choice of a blue colour spectrum is only 
to make the illustration aesthetically pleasing; real elevation values are stored 
in the raster, so instead we could have printed a real number value in each cell. 
This would not have made the figure very legible, however. 



Figure 2.17: A raster rep¬ 
resentation (in part) of the 
elevation of the study area 
of Figure 2.2. Actual ele¬ 
vation values are indicated 
as shades of blue. The 
depicted area is the north¬ 
east flank of the moun¬ 
tain in the south-east of 
the study area. The right- 
hand side of the figure is 
a zoomed-in part of that of 
the left. 


A raster can be thought of as a long list of field values: actually, there should 
be m X n such values. The list is preceded with some extra information, like 
a single georeference as the origin of the whole raster, a cell size indicator, the 
integer values for m and n, and a data type indicator for interpreting cell values. 
Rasters and quadtrees do not store the georeference of each cell, but infer it from 
the above information about the raster. 
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A TIN is a much 'sparser' data structure: the amount of data stored is less if 
we try to obtain a structure with approximately equal interpolation error, as 
compared to a regular raster. The quality of the TIN depends on the choice of 
anchor points, as well as on the triangulation built from it. It is, for instance, 
wise to perform 'ridge following' during the data acquisition process for a TIN. 
Anchor points on elevation ridges are will assist in correctly representing peaks 
and mountain slope faces. 
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Vector representation of a field 


We briefly mention a final representation for fields like elevation, but using a 
vector representation. This technique uses isolines of the field. An isoline is a 

linear feature that connects the points with equal field value. When the field is Isoline 

elevation, we also speak of contour lines. The elevation of the Falset study area is 
represented with contour lines in Figure 2.18. Both TlNs and isoline representa¬ 
tions use vectors. 



Figure 2.18; A vector- 
based elevation field rep¬ 
resentation for the study 
area of Figure 2.2. In¬ 
dicated are elevation iso¬ 
lines at a resolution of 
25 metres. 

Data source: Department 
of Earth Systems Analysis 
(ESA, ITC) 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 


bibliography 


about 






















2.3. Computer representations of geographic information 


118 


Isolines as a representation mechanism are not very common, however. They are 
in use as a geoinformation visualization technique (in mapping, for instance), but 
commonly using a TIN for representing this type of field is the better choice. 
Many GIS packages provide functions to generate an isoline visualization from 
a TIN. 
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2.3.7 Representation of geographic objects 

The representation of geographic objects is most naturally supported with vec¬ 
tors. After all, objects are identified by the parameters of location, shape, size 
and orientation (see Section 2.2.4), and many of these parameters can be ex¬ 
pressed in terms of vectors. However, tessellations are still commonly used for 
representing geographic objects as well, and we discuss why below. 
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Tessellations to represent geographic objects 


Remotely sensed images are an important data source for GIS applications. Un¬ 
processed digital images contain many pixels, with each pixel carrying a re¬ 
flectance value. Various techniques exist to process digital images into classi¬ 
fied images that can be stored in a GIS as a raster. Image classification attempts Image classification 

to characterize each pixel into one of a finite list of classes, thereby obtaining 
an interpretation of the contents of the image. The classes recognized can be 
crop types as in the case of Figure 2.19 or urban land use classes as in the case 
of Figure 2.20. These figures illustrate the unprocessed images (a) as well as a 
classified version of the image (b). 



Figure 2.19: An unpro¬ 
cessed digital image (a) 
and a classified raster (b) 
of an agricultural area. 


The application at hand may be interested only in geographic objects identi¬ 
fied as potato fields (Figure 2.19(b), in yellow) or industrial complexes (Fig¬ 
ure 2.20(b), in orange). This would mean that all other classes are considered 
unimportant, and are probably dropped from further analysis. If that further 
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analysis can be carried out with raster data formats, then there is no need to 
consider vector representations. 


(a) 




Figure 2.20: An unpro¬ 
cessed digital image (a) 
and a classified raster (b) 
(b) of an urban area. 


How the process of image classification takes place is not the subject of this book. 
It is dealt with in Principles of Remote Sensing [53]. Nonetheless, we must make 
a few observations regarding the representation of geographic objects in rasters. 
Area objects are conveniently represented in raster, albeit that area boundaries 
may appear as jagged edges. This is a typical by-product of raster resolution 
versus area size, and artificial cell boundaries. One must be aware, for instance, 
of the consequences for area size computations: what is the precision with which 
the raster defines the object's size? 

Line and point objects are more awkward to represent using rasters. After all, we 
could say that rasters are area-based, and geographic objects that are perceived 
as lines or points are perceived to have zero area size. Standard classification 
techniques, moreover, may fail to recognize these objects as points or lines. 
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Many GISs do offer support for line representations in raster, and operations on 
them. Lines can be represented as strings of neighbouring raster cells with equal 
value, as is illustrated in Figure 2.21. Supported operations are connectivity 
operations and distance computations. There is again an issue of precision of 
such computations. 



Figure 2.21: An actual 
straight line (in black) and 
its representation (light 
green cells) in a raster. 
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Vector representations for geographic objects 

The somehow more natural way to represent geographic objects is by vector 
representations. We have discussed most issues already in Section 2.3.3, and a 
small example suffices at this stage. 



Figure 2.22: Various ob¬ 
jects (buildings, bike and 
road lanes, railroad tracks) 
represented as area ob¬ 
jects in a vector represen¬ 
tation. 


In Figure 2.22, a number of geographic objects in the vicinity of the ITC building 
have been depicted. These objects are represented as area representations in a 
boundary model. Nodes and vertices of the polylines that make up the object's 
boundaries are not illustrated, though they obviously are stored. 
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2.4 Organizing and managing spatial data 


In the previous sections, we have discussed various types of geographic infor¬ 
mation and ways of representing them. We have looked at case-by-case exam¬ 
ples, however, we have purposefully avoided looking at how various sorts of 
spatial data are combined in a single system. 



Figure 2.23: Different 

rasters can be over¬ 
laid to look for spatial 
correlations. 


The main principle of data organization applied in GIS sysfems is fhaf of a spatial 
data layer. A spatial data layer is either a representation of a continuous or discrete 
field, or a collection of objects of the same kind. Usually, the data is organized 
so that similar elements are in a single data layer. For example, all telephone 

booth point objects would be in one layer, and all road line objects in another. A Management of attribute or 

data layer contains spatial data—of any of the types discussed above—as well thematic data 

as attribute (or: thematic) data, which further describes the field or objecfs in the 

layer. Attribute data is quite often arranged in tabular form, mainfained in some 

kind of geodafabase, as we will see in Chapfer 3. An example of fwo field dafa 

layers is provided in Figure 2.23. 

Data layers can be overlaid with each other, inside the GIS package, so as to 
study combinations of geographic phenomena. We shall see lafer thaf a GIS can 
be used fo sfudy the spatial relationships between different phenomena, requiring 
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computations which overlay one data layer with another. This is schematically 
depicted in Figure 2.24 for two different object layers. Field layers can also be 
involved in overlay operators. Chapter 3 will discuss the functions offered by 
GISs and database systems for data management in more detail. 



Figure 2.24: Two different 
object layers can be over¬ 
laid to look for spatial cor¬ 
relations, and the result 
can be used as a separate 
(object) layer. 
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2.5 The temporal dimension 


Besides having geometric, thematic and topological properties, geographic phe¬ 
nomena are also dynamic; they change over time. For an increasing number of 
applications, these changes themselves are the key aspect of the phenomenon 
to study. Examples include identifying the owners of a land parcel in 1972, or 

how land cover in a certain area changed from native forest to pastures over Dynamic phenomena 

a specific time period. We can note that some features or phenomena change 
slowly, such as geological features, or as in the example of land cover given 
above. Other phenomena change very rapidly, such as the movement of people 
or atmospheric conditions. For different applications, different scales of mea¬ 
surement will apply. 

Examples of the kinds of questions involving time include: 

• Where and when did something happen? 

• How fast did this change occur? 

• In which order did the changes happen? 

The way we represent relevant components of the real world in our models can 
influence the kinds of questions we can or caimot answer. This chapter has al¬ 
ready discussed representation issues for spatial features, but has so far ignored 
the problematic issues for incorporating time. The main reason lies in the fact 

that GISs still offer limited support for the representation of time. As a result. Time in GIS 

most studies require substantial efforts from the GIS user in data preparation 
and data manipulation. Also, besides representing an object or field in 2D or 3D 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















2 . 5 . The temporal dimension 


127 


space, the temporal dimension is of a continuous nature. Therefore in order to 
represent it in a computer, we have to 'discretize' the time dimension. 

Spatiotemporal data models are ways of organizing representations of space and 
time in a GIS. Several representation techniques have been proposed in the liter¬ 
ature. Perhaps the most commmon of these is a 'snapshot' state that represents a 
single point in time of an ongoing natural or man-made process. We may store a 

series of these snapshot states to represent change, but must be aware that this is Representing time in GIS 
by no means a comprehensive representation of that process. Further discussion 
of spatiotemporal data models is outside the scope of this book, and readers are 
referred to Langran [33] for a discussion of relevant concepts and issues. Here 
we will present a brief examination of different 'concepts' of time. 

• Discrete and continuous time: Time can be measured along a discrete or 
continuous scale. Discrete time is composed of discrete elements (seconds, 
minutes, hours, days, months, or years). In continuous time, no such 
discrete elements exist, and for any two different points in time, there is 
always another point in between. We can also structure time by events 
(points in time) or periods (time intervals). When we represent time peri¬ 
ods by a start and end event, we can derive temporal relationships between 
events and periods such as 'before', 'overlap', and 'after'. 

• Valid time and transaction time: Valid time (or world time) is the time when 
an event really happened, or a string of events took place. Transaction time 
(or database time) is the time when the event was stored in the database 
or GIS. Observe that the time at which we store something in the data¬ 
base/ GIS typically is (much) later than when the related event took place. 

• Linear, branching and cyclic time: Time can be considered to be linear, ex- 
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tending from the past to the present ('now'), and into the future. This view 
gives a single time line. For some types of temporal analysis, branching 
time—in which different time lines from a certain point in time onwards 
are possible—and cyclic time—in which repeating cycles such as seasons 
or days of a week are recognized, make more sense and can be useful. 

• Time granularity: When measuring time, we speak of granularity as the 
precision of a time value in a GIS or database (e.g. year, month, day, sec¬ 
ond, etc.). Different applications may obviously require different granu¬ 
larity. In cadastral applications, time granularity might well be a day, as 
the law requires deeds to be date-marked; in geological mapping applica¬ 
tions, time granularity is more likely in the order of thousands or millions 
of years. 

• Absolute and relative time: Time can be represented as absolute or relative. 

Absolute time marks a point on the time line where events happen (e.g. '6 
July 1999 at 11:15 p.m.'). Relative time is indicated relative to other points 
in time (e.g. 'yesterday', 'last year', 'tomorrow', which are all relative to 
'now', or 'two weeks later', which is relative to some other arbitrary point 
in time.). 

Part of an example data set from a project investigating change is provided in 
Figure 2.25. The purpose of this particular study was to assess whether radar im¬ 
ages are reliable resources for detecting the disappearance of primary forests [6]. 

This area of work is commonly known as change detection. Studies of this type Change detection 

are usually based on some 'model of change', which includes knowledge and 
hypotheses of how change occurs for the specific phenomena being studied. In 
this case, it included knowledge about speed of tree growth. 
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In spatiotemporal analyses we consider changes of spatial and thematic attributes 
over time. We can keep the spatial domain fixed and look only at the attribute 
changes over time for a given location in space. We might be interested how 
land cover changed for a given location or how the land use changed for a given 
land parcel over time, provided its boundary did not change. On the other hand, 
we can keep the attribute domain fixed and consider the spatial changes over time 

for a given thematic attribute. In this case, we might want to identity locations Spatiotemporal analysis 
that were covered by forest over a given period of time. Finally, we can assume 
both the spatial and attribute domain variable and consider how fields or objects 
changed over time. This may lead to notions of object motion, a subject receiv¬ 
ing increasing attention in the literature. Applications of moving object research 
include traffic control, mobile telephony, wildlife tracking, vector-bome disease 
control, and weather forecasting. 

In these types of applications, the problem of object identity becomes apparent. 

When does a change or movement cause an object to disappear and become 

a new one? With wildlife this is quite obvious; with weather systems less so. Object Identity 

But this should no longer surprise the reader: we have already seen that some 
geographic phenomena can be nicely described as objects, while others are better 
represented as fields. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















2 . 5 . The temporal dimension 


130 



Figure 2.25; The change 
of land cover in a 9 x 
14 km study site near San 
Jose del Guaviare, div. 
Guaviare, Colombia, dur¬ 
ing a study conducted in 
1992-1994 by Bijker [6]. 
A time series of ERS-1 
radar images after appli¬ 
cation of (1) image seg¬ 
mentation, (2) rule-based 
image classification, and 
(3) further classification 
using a land cover change 
model. The land cover 
classes are: 

■ primary forest, 

■ secondary vegetation, 

■ secondary vegetation 
with Cecropia trees, 

■ pasture, and 

■ pasture & secondary 
vegetation. 

Data source: Wietske Bij¬ 
ker, ITC. 
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Summary 


Geographic phenomena are present in the real world that we study; their com¬ 
puter representations only live inside computer systems. This chapter has dis¬ 
cussed different types of geographic phenomena, and examined the ways that 
these can be represented in a computer system, such as a GIS. 

An important distinction between phenomena is whether it is omnipresent—i.e. 
occurring everywhere in the study area—or whether its constituents somehow 
'sparsely' populate the study area. The first class of phenomena we called fields, 
the second class objects. Amongst fields, we identified continuous and discrete 
phenomena. Gontinuous phenomena could even be differentiable, meaning that 
for locations factors such as gradient and aspect can be determined. Amongst 
objects, important classification parameters include location, shape, size and ori¬ 
entation. The dimension of an object is a fundamental part of its shape parame¬ 
ter: is it a point, line, area or volume object? In all cases, a representation of the 
boundary of the object (whether crisp or fuzzy), is offen used in GIS. 

The second half of the chapter elaborated on the techniques with which the 
above phenomena are actually stored in a computer system. The fundamental 
problem in obtaining realistic representations is that these are usually continu¬ 
ous in nature, thus requiring an infinite data collection to represent them faith¬ 
fully. As a consequence of the finite memory that we have available in computer 
systems, we must accept finite representations. This leads to approximations, 
and therefore error, in our GIS data. 
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Questions 

1. For your own GIS application domain, make up a list of at least 20 different 
geographic phenomena that might be relevant. 

2. Take the list of question 1 and identify which phenomena are fields and 
which are objects. Which of your example objects are crisp? 

3. There is an obvious natural relationship between remotely sensed images 
and geographic fields, as we have defined them in this chapter, yet the 
two are not the same thing. Elaborate on this, and discuss what are the 
differences. 

4. Location, shape, size and orientation are potentially relevant characteris¬ 
tics of geographic objects. Try to provide an application example in which 
these characteristics do make sense for (a) point objects, (b) line objects, 
(c) area objects. 

5. On page 70, we stated a rule-of-thumb, namely that natural phenomena 
are more often fields, whereas man-made phenomena are more often ob¬ 
jects. Provide counter-examples from a GIS application domain to this rule: 
name at least one natural phenomenon that is better perceived as object(s), 
and name a man-made phenomenon that is better perceived as field. (The 
latter is more difficult.) 

6. On page 108, we provided the (logical) definition of the 'meets' relation¬ 
ship. Provide your version of the definitions of 'covered by' and 'over¬ 
laps'. Explain why this set of topological relationships between regions is 
also known as the four-intersection scheme. 
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7. What colour is the northwest quadrant of the outermost northeast quad¬ 
rant of Figure 2.7? First check the field on the left, then use the quadtree on 
the right. What colour is the southeast quadrant of the outermost northeast 
quadrant? 

8. Make an educated guess at the elevation of location P in Figure 2.8. What 
are the gradient and the aspect of the slope in this location, approximately? 
In a second stage, do this again, now based on the tessellations of Figure 2.9 
(first the left one, then the right one). 

9. Explain how many line objects and how many line segments are illustrated 
in Figure 2.12. Complete the table on the left, using a numbering of vertices 
that you have made up yourself for Figure 2.11. 

10. In this chapter we have discussed raster-based and vector-based represen¬ 
tations of geographic phenomena. We have not explicitly discussed what 
are the advantages and disadvantages of either. What do you think they 
are? 

11. In Figure 2.21, we presented an actual line, and its representation in the 
raster. Compute the real length of the line (taking cell width as the unit). 
In rasters, when a CIS computes a distance it uses 1 as the distance between 
two cells that share a side, and it uses a/2 as the distance between two cells 
that share only a comer point. What would be the computed length by the 
CIS of the line's representation in Figure 2.21? What can be said in general 
about the two lengths? 


<J> 

<J> 
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12. We have emphasized throughout the chapter that GIS representations of 
geographic phenomena are necessarily finite, notwithstanding the natu¬ 
rally continuous or curvilinear nature of the objects that we study We are 
thus approximating, and are making errors by doing so. Do you think 
there is any way of computing what the errors are that we are making? 

13. What observations can be made from a visual interpretation of Figure 2.25? 
What changes do you 'detect'? Which stages of change? 




previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















Chapter 3 

Data management and processing 
systems 


The ability to manage and process spatial data is a critical component for any 
functioning GIS. Simply put, data processing systems refer to hardware and 
software components which are able to process, store and transfer data. This 
chapter discusses the components of systems that facilitate the management and 
processing of geoinformation. In order to provide a brief background to the dis¬ 
cussion, the chapter begins with a brief discussion of computer hardware and 
software trends. 

In Section 3.4, we discuss database management systems (DBMSs), and illus¬ 
trate some principles and methods of data extraction from a database. The final 
section of the chapter (Section 3.5) looks at the merging of GIS and DBMS, and 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 


135 




















the emergence of spatial databases in recent years. It notes their key advantages, 
and briefly illustrates the use of a spatial database for data storage and process¬ 
ing. 
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3.1 Hardware and software trends 


Advances in computer hardware seem to take place at an ever-increasing rate. 

Every several months, a faster, more powerful processor generation replaces the 
previous one. Computers are also becoming increasingly portable, while offer¬ 
ing this increased performance. The computing power that we have available in 

today's handheld computers is a multiple of the performance that the first PC Handheld PC’s 

had when it was introduced in the early 1980's. In fact, current PCs have orders 
of magnitude more memory and storage capacity than the so-called minicom¬ 
puters of 25 years ago. To illustrate this trend: compare a typical early 1980's PC 
with a 2 MHz CPU, 128 kbytes of main memory, and a 10 MByte hard disk to 
the current generation of desktop PC's. Table 3.1 shows the list of standard unit 
prefixes for reference purposes. 

Computers are also becoming increasingly affordable. Hand-held computers 
are now commonplace in business and personal use, equipping field surveyors 
with powerful tools, complete with GPS capabilities for instantaneous georefer- 
encing. To support these hardware trends, software providers continue to pro¬ 
duce application programs and operating systems that, while providing a lot 
more functionality, also consume significantly more memory. In general, soft¬ 
ware technology has developed somewhat slower and often carmot fully utilise 
the possibilities offered by the exponentially growing hardware capabilities. Ex¬ 
isting software obviously performs better when run on faster computers. 

Alongside these trends, there have also been significant developments in com¬ 
puter networks. In essence, today almost any computer on Earth can cormect to 
some network, and contact computers virtually anywhere else, allowing fast and 
reliable exchange of (spatial) data. Mobile phones are more and more frequently 
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being used to connect to computers on the Internet. The UMTS protocol (Univer¬ 
sal Mobile Telecommunications System), allows digital communication of text. Mobile communication 

audio, and video at a rate of approximately 2 Mbps. The new HSDPA protocol 
offers up to 10 times this speed. Looking at these developments it is clear that 
he combination of a GPS receiver, a portable computer and mobile phone has 
already dramatically changed our world, certainly so for out-of-office activities 
of Earth science professionals . 


prefix 

m 

c 

d 

h 

k 

M 

G 

T 

P 

E 

name 

milli 

centi 

deci 

hecto 

kilo 

mega 

giga 

tera 

peta 

exa 

factor 

1-^ 

O 

1 

CO 

10-2 

10-^ 

102 

10^ 

10^ 

10^ 

10^2 

1015 

1-^ 

O 

00 


Table 3.1: Commonly 

used unit prefixes 


Bluetooth version 2.0 is a standard that offers up to 3 Mbps connections, espe¬ 
cially between palm- and laptop computers and their peripheral devices, such as 
a mobile phone, GPS or printer at short range. Wireless LANs (Local Area Net¬ 
works), under the so-called WiFi standard, nowadays offer a bandwidth of up Wireless LAN and WiFi 
to 108 Mbps on a single cormection point, to be shared between computers. They 
are more and more used for constructing a computer network in office buildings 
and in private homes. 

When the medium of communication is not the air, but copper or fibre optics 
cables (structured networks), the picture is a different one. Standard 'Dial-up' 
telephone modems allow rates up to 56 kbps. Digital telephone links (ISDN) 

support much higher rates: up to 1.5 Mbps. ADSL technology widely avail- Structured networks 

able through telephone companies on standard copper-wire networks supports 
transfer rates anywhere between 2 and 20 Mbps towards the customer (down¬ 
stream), and between 1 and 8 Mbps towards the network (upstream) depending 
on the internet provider and quality of the network infrastructure. 
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Wide-area computer networks (national, continental, global) have a capacity of 
several Gbps. ITC's dedicated Local Area Network (LAN), which is partially 
fibre optics-based, supports a transmission rate locally of 1 Gbps. Since fibre 
optic cables in principle support rates of various Gbps, it is unlikely that this 
bandwidth capacity will be exceeded in the very near future. 
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3.2 Geographic information systems 


It was identified in Chapter 1 that a CIS provides a range of capabilities to handle 
georeferenced data, including: 

1. Data capture and preparation, 

2. Data management (storage and maintenance), 

3. Data manipulation and analysis, and 

4. Data presentation. 

For many years, analogue data sources were used, processing was done man¬ 
ually, and paper maps were produced. The introduction of modern techniques 
has led to an increased use of computers and digital information in all aspects of 
spatial data handling. The software technology used in this domain is centered 
around geographic information systems. 

Typical plarming projects require data sources, both spatial and non-spatial, from 
different national institutes, like national mapping agencies, geological, soil, and 
forest survey institutes, and national census bureaus. The data sources obtained 
may be from different time periods, and the spatial data may be in different 

scales or projections. With the help of a CIS, the spatial data can be stored in Data requirements 

digital form in world coordinates. This makes scale transformations uimeces- 

sary, and the conversion between map projections can be done easily with the 

software. With the spatial data thus prepared, spatial analysis functions of the 

GIS can then be applied to perform the plarming tasks. 
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What remains is to pay careful attention to the quality (or lack of it) in the differ¬ 
ent datasets, to ensure that unnecessary error is not being introduced. Chapter 5 
discusses these issues in more detail. In this chapter we will focus on CIS soft¬ 
ware and ways to manage spatial and attribute data. 
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3.2.1 GIS software 

As noted previously, GIS can be considered to be a data store (i.e. a system that 
stores spatial data), a toolbox, a technology, an information source or a field of 
science. The main characteristics of a GIS software package are its analytical 
functions that provide means for deriving new geoinformation from existing 
spatial and attribute data. 

The use of tools for problem solving is one thing, but the production of these 
tools is something quite different. Not all tools are equally well-suited for a 
particular application, and they can be improved and perfected to better serve 
a particular need or application. The discipline of geographic information science 
is driven by the use of our GIS tools, and these are in turn improved by new 
insights and information gained through their application in various scientific 
fields. Spatial information theory is one such field, which focuses specifically on 
providing the background for the production of tools for the handling of spatial 
data. 

All GIS packages available on the market have their strengths and weaknesses, 
typically resulting from the development history and/or intended application 
domain(s) of the package. Some GISs have traditionally focused more on sup¬ 
port for raster-based functionality, others more on (vector-based) spatial objects. 
We can safely state that any package that provides support for only rasters or 
only objects, is not a complete GIS. Well-known, full-fledged GIS packages in¬ 
clude ILWIS, Intergraph's GeoMedia, ESRTs ArcGIS, and Mapinfo from Map- 
Info Gorp. Several of these systems are used within ITG in practical sessions 
of the Principles of GIS teaching module. This textbook attempts to describe 
the field of GIS independently from specific software packages, as 'principles' 
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should be useful to users of any package. 

There is no particular GIS package which is necessarily 'better' than another 
one: this depends on factors such as the intended application, and the expertise 
of its user. ILWlS's traditional strengths are in raster processing and scientific 
spatial data analysis, especially in project-based GIS applications. Intergraph, 
ESRl and Maplnfo products have been known better for their support of vector- 
based spatial data and their operations, user interface and map production (a bit 
more typical of institutional GIS applications). Any such brief characterization, 
however, fails to do justice to any of these packages, and it is only after extended 
use that their strengths, and sometimes weaknesses, might become clear. 
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3.2.2 GIS architecture and functionality 

We have already noted that a geographic information system in the wider sense 
consists of software, data, people, and an organization in which it is used. Be¬ 
fore moving on, we should also note that organizational factors will define the Role of GIS In organizations 
context and rules for the capture, processing and sharing of geoinformation, as 
well as the role which GIS plays in the organization as a whole. In the remainder 
of this chapter we focus on the architecture and functional components of GIS 
software. 

As noted above, a GIS consists of several functional components —components 
which support key GIS functions. These are data capture and preparation, data 
storage, data analysis, and presentation of spatial data. Figure 3.1 shows a dia¬ 
gram of these components, with arrows indicating the data flow in the system. 

For a particular GIS, each of these components may provide many or only a few 
functions. Arguably, the system should not be called a geographic information 
system if any one of these components is missing. It is important to note how¬ 
ever, that the same function may be offered by different components of the GIS: 
for instance, data capture and data storage may have functions in common, and 
the same holds for data preparation and data analysis. 

The following sections briefly describe these components, focussing on storage 
and maintenance. Later in this chapter, we will discuss the role of DBMS and 
more specifically, spatial databases, in the storage and maintenance of geospatial 
data. A more detailed treatment of the other functional components in Figure 3.1 
can be found in follow-up chapters. 
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Figure 3.1: Functional 

components of a GIS 
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3.2.3 Spatial Data Infrastructure (SDI) 

For reasons that include efficiency and legislation, many organizations are forced 
fo work in a cooperative setting in which geographic information is obtained 
from, and provided to, partner organizations and the general public. The shar¬ 
ing of spatial data between the various GISs in those organizations is of key Data sharing 

imporfance and aspecfs of dafa dissemination, securify, copyright and pricing 
require special attention. The design and maintenance of a Spatial Dafa Infras¬ 
tructure (SDI) deals with these issues. 

In [42] an SDI is defined as "the relevant base collection of fechnologies, poli¬ 
cies and institutional arrangements that facilitate the availability of and access 
fo spatial data". Fundamental to those arrangements are—in a wider sense— 
the agreements between organizations and in the narrow sense, the agreements 

between software systems on how to share the geographic information. In SDI, Standards 

standards are often the starting point for those agreements. Standards exist for 
all facefs of GIS, ranging from dafa capfure fo dafa presenfation. They are de¬ 
veloped by differenf organizations, of which the most prominent are the Inter¬ 
national Organization for Sfandardisation (ISO) and the Open Geospatial Gon- 
sortium (OGG). 

Typically, an SDI provides its users with different facilities for finding, viewing, 
downloading and processing data. Because the organizations in an SDI are nor¬ 
mally widely distributed over space, computer networks are used as the means 
of communication. With the development of the internet, the fimctional compo¬ 
nents of GIS have been gradually become available as web-based applications. 

Much of the functionality is provided by so called geo-webservices, software Geo-webservices 

programs that act as an intermediate between geographic data(bases) and the 
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users of the web. Geo-webservices can vary from a simple map display service 
to a service which involves complex spatial calculations. For their spatial data 
handling, these services commonly use standardized raster and vector represen¬ 
tations following the abovementioned standards. 
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3.3 Stages of spatial data handling 
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3.3.1 Spatial data capture and preparation 

The functions for capturing data are closely related to the disciplines of survey¬ 
ing engineering, photogrammetry, remote sensing, and the processes of digitiz¬ 
ing, i.e. the conversion of analogue data into digital representations. Remote 
sensing, in particular, is the field that provides photographs and images as the 
raw base data from which spatial data sets are derived. Surveys of the study 
area often need to be conducted for data that caimot be obtained with remote 
sensing techniques, or to validate data thus obtained. 

Traditional techniques for obtaining spatial data, typically from paper sources, 
included manual digitizing and scanning. Table 3.2 lists the main methods and de¬ 
vices used for data capture. In recent years there has been a significant increase Digitizing and scanning 
in the availability and sharing of digital (geospatial) data. As discussed above, 
various media and computer networks play an important role in the dissemina¬ 
tion of this data, particularly the internet. 

The data, once obtained in some digital format, may not be quite ready for use in 
the system. This may be because the format obtained from the capturing process 
is not quite the format required for storage and further use, which means that 
some type of data conversion is required. In part, this problem may also arise 

when the captured data represents only raw base data, out of which the real data Data conversion 

objects of interest to the system still need to be constructed. For example, semi¬ 
automatic digitizing may produce line segments, while the application's require¬ 
ments are that non-overlapping polygons are needed. A build-and-verification 
phase would then be needed to obtain these from the captured lines. 

Issues related to data acquisition and preparation are discussed in greater detail 
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Method 

Devices 

Manual digitizing 

• coordinate entry via keyboard 

• digitizing tablet with cursor 

• mouse cursor on the computer 
monitor (heads-up digitizing) 

• (digital) photogrammetry 

Automatic digitizing 

• scanner 

Semi-automatic digitizing 

• line-following software 

Input of available digital data 

• CD-ROM or DVD-ROM 

• via computer network or internet 
(including geo-webservices) 


Table 3.2: Spatial data in¬ 
put methods and devices 
used 


in Chapter 5. 
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3.3.2 Spatial data storage and maintenance 

The way that data is stored plays a central role in the processing and the eventual 
understanding of that data. In most of fhe available systems, spatial data is orga¬ 
nized in layers by theme and/or scale. For instance, the data may be organized 
in thematic categories, such as land use, topography and administrative subdi¬ 
visions, or according to map scale. An important underlying need or principle is 

a representation of fhe real world fhaf has fo be designed fo reflecf phenomena Data organization 

and their relationships as naturally as possible. In a GIS, features are represented 
with their (geometric and non-geometric) attributes and relationships. The ge¬ 
ometry of feafures is represenfed wifh primitives of the respective dimension: a 
windmill probably as a point, an agricultural field as a polygon. The primitives 
follow either the vector, as in the example, or the raster approach. 

As described in Chapter 2, vector data types describe an object through its bound¬ 
ary, thus dividing the space into parts that are occupied by the respective objects. 

The raster approach subdivides space into (regular) cells, mostly as a square tes¬ 
sellation of dimension fwo or fhree. These cells are called either cells or pixels in Cells, pixels and voxels 
2D, and voxels in 3D. The data indicates for every cell which real world feafure 
if covers, in case if represenfs a discrefe field. In case of a continuous field, fhe 
cell holds a represenfafive value for fhaf field. Table 3.3 lisfs advanfages and 
disadvanfages of rasfer and vecfor represenfafions. 


The sforage of a raster is, in principle, straightforward. It is stored in a file as a 
long lisf of values, one for each cell, preceded by a small list of exfra dafa (the 
so-called Tile header') that informs how to interpret the long list. The order of 
the cell values in the list can be—^but need not be—left-to-right, top-to-bottom. 
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Raster representation 

Vector representation 

advantages 

• simple data structure 

• simple implementation of 
overlays 

• efficient for image processing 

• efficient representation of topology 

• adapts well to scale changes 

• allows representing networks 

• allows easy association 
with attribute data 

disadvantages 

• less compact data structure 

• difficulties in representing 
topology 

• cell boundaries independent 
of feature boundaries 

• complex data structure 

• overlay more difficult to implement 

• inefficient for image processing 

• more update-intensive 


Table 3.3: Raster and vec¬ 
tor representations com¬ 
pared 


This simple encoding scheme is known as row ordering. The header of the raster 
file will typically inform how many rows and columns the raster has, which 

encoding scheme is used, and what sort of values are stored for each cell. Raster Raster encoding 

files can be quite big data sets. For computational reasons, it is wise to organize 
the long list of cell values in such a way that spatially nearby cells are also near 
to each other in the list. This is why other encoding schemes have been devised. 

The reader is referred to [34] for a more detailed discussion. 

Low-level storage structures for vector data are much more complicated, and a 
discussion is certainly beyond the purpose of this introductory text. The best 
intuitive understanding can be obtained from Figure 2.12, where a boundary 
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model for polygon objects was illustrated. Similar structures are in use for line 
objects. For further, advanced reading, please see [49]. 

GIS software packages provide support tor both spatial and attribute data, i.e. 
they accommodate spatial data storage using a vector approach, and attribute 
data using tables. Historically, however, database management systems (DBMSs) 
have been based on the notion of tables for data storage. For some time, sub¬ 
stantial GIS applications have been able to link to an external database to store 
attribute data and make use of its superior data management functions. Gur- 
rently. All major GIS packages provide facilities to link with a DBMS and ex¬ 
change attribute data with it. Spatial (vector) and attribute data are still some¬ 
times stored in separate structures, although they can now be stored directly in 
a spatial database. More detail on these issues is provided in Section 3.5. 

Maintenance of (spatial) data can best be defined as the combined activities to 
keep the data set up-to-date and as supportive as possible to the user commu¬ 
nity. It deals with obtaining new data, and entering them into the system, pos¬ 
sibly replacing outdated data. The purpose is to have an up-to-date stored data 
set available. After a major earthquake, for instance, we may have to update 
our road network data to reflect that roads have been washed away, or have 
otherwise become impassable. 

The need for updating spatial data stems from the requirements that the data 
users impose, as well as the fact that many aspects of the real world change 
continuously. These data updates can take different forms. It may be that a 
complete, new survey has been carried out, from which an entirely new data 
set is derived that will replace the current set. Such a situation is typical if the 
spatial data originates from remotely sensed data, for example, a new vegetation 
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cover set, or a new digital elevation model. It may also be that local (ground) 
surveys have revealed local changes, for instance, new constructions, or changes 
in land use or ownership. In such cases, local change to the large spatial data set 
is more typically required. Such local changes should respect matters of data 
consistency, i.e. they should leave other spatial data within the same layer intact 
and correct. 
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3.3.3 Spatial query and analysis 

The most distinguishing parts of a GIS are its functions for spatial analysis, i.e. 
operators that use spatial data to derive new geoinformation. Spatial queries and 
process models play an important role in this functionality One of the key uses of 

GISs has been to support spatial decisions. Spatial decision support systems (SDSS) SDSS 

are a category of information systems composed of a database, GIS software, 
models, and a so-called knowledge engine which allow users to deal specifically 
with locational problems. 

In a GIS, data are usually grouped into layers (or themes). Usually, several 
themes are part of a project. The analysis functions of a GIS use the spatial 
and non-spatial attributes of the data in a spatial database to provide answers 

to user questions. GIS functions are used for maintenance of the data, and for Spatial data analysis 

analysing the data in order to infer information from it. Analysis of spatial data 
can be defined as computing new information that provides new insight from 
the existing, stored spatial data. 

Gonsider an example from the domain of road construction. In mountainous ar¬ 
eas this is a complex engineering task with many cost factors, which include the 
amount of tuimels and bridges to be constructed, the total length of the tarmac, 
and the volume of rock and soil to be moved. GIS can help to compute such costs 
on the basis of an up-to-date digital elevation model and soil map. Maintenance 
and analysis of attribute data is discussed further in Section 3.4. 

The exact nature of the analysis will depend on the application requirements, but 
computations and analytical functions operate on both spatial and non-spatial 
data. Ghapter 6 discusses these issues in more detail. For now, we will focus on 
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the last stage of Figure 3.1. 
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3.3.4 Spatial data presentation 

The presentation of spatial data, whether in print or on-screen, in maps or in tab¬ 
ular displays, or as Taw data', is closely related to the disciplines of cartography, 
printing and publishing. The presentation may either be an end-product, for ex¬ 
ample as a printed atlas, or an intermediate product, as in spatial data made 
available through the internet. 


Method 

Devices 

Hard copy 

• printer 

• plotter (pen plotter, ink-jet printer, thermal 
transfer printer, electrostatic plotter) 

• film writer 

Soft copy 

• computer screen 

Output of digital 
data sets 

• magnetic tape 

• CD-ROM or DVD 

• the Internet 


Table 3.4: Spatial data 
presentation 


Table 3.4 lists several different methods and devices used for the presentation of 
spatial data. Cartography and scientific visualization make use of these methods 
and devices to produce their products. Chapter 7 is devoted to visualization 
techniques for spatial data. 
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3.4 Database management systems 


A database is a large, computerized collection of structured data. 


In the non-spatial domain, databases have been in use since the 1960's, for vari¬ 
ous purposes like bank account administration, stock monitoring, salary admin¬ 
istration, order bookkeeping, and flight reservation systems to name just a few. 

The common denominator between these applications is that the amount of data 
is usually quite large, but the data itself has a simple and regular structure. 

Designing a database is not an easy task. Firstly, one has to consider carefully 
what the database purpose is, and who its users will be. Secondly, one needs 
to identify the available data sources and define the format in which the data 
will be organized within the database. This format is usually called the data¬ 
base structure. Lastly, data can be entered into the database. It is important to Database design and 

keep the data up-to-date, and it is therefore wise to set up the processes for this, maintenance 

and make someone responsible for regular maintenance of the database. Docu¬ 
mentation of the database design and set-up is crucial for an extended database 
life. Many enterprise databases tend to outlive the professional careers of their 
original designers. 


A database management system (DBMS) is a software package that allows 
the user to set up, use and maintain a database. 


Like a GIS allows the set-up of a GIS application, a DBMS offers generic func¬ 
tionality for database organization and data handling. Below, we will take a 
closer look at what type of functions are offered by DBMSs. Many standard PGs 
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are equipped with a DBMS called MS Access. This package offers a useful set 
of functions, and the capacity to store terabytes of information. 
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3.4.1 Reasons for using a DBMS 

There are various reasons why one would want to use a DBMS for data storage 
and processing. 

• A DBMS supports the storage and manipulation of very large data sets. 

Some dafa sefs are so big fhat sforing them in text files or spreadsheet files be¬ 
comes foo awkward for use in practice. The resulf may be thaf finding simple 
facts takes minutes, and performing simple calculations perhaps even hours. A 
DBMS is specifically designed for fhis purpose. 

• A DBMS can be insfrucfed fo guard over data correctness. 


For insfance, an important aspect of data correctness is data entry checking: en¬ 
suring that the data that is entered into the database does not contain obvious 
errors. For instance, since we know the study area we are working in, we also 
know the range of possible geographic coordinafes, so we can ensure fhe DBMS 
checks them. 

The above is a simple example of the type of rules, generally known as integrity 
constraints, fhaf can be defined in and automafically checked by a DBMS. More 
complex infegrify consfrainfs are cerfainly possible, and their definition is part 
of the design of a dafabase. 

• A DBMS supporfs the concurrent use of the same data set by many users. 
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Large data sets are built up over time, which means that substantial investments 
are required to create and maintain them, and that probably many people are 
involved in the data collection, maintenance and processing. These data sets are 
often considered to be of a high strategic value for the owner(s), which is why 
many may want to make use of them within an organization. 

Moreover, for different users of the database, different views on the data can be 
defined. In this way, users will be under the impression that they operate on 
their personal database, and not on one shared by many people. They may all 
be using the database at the same time, without affecting each other's activities. 
This DBMS function is called concurrency control. 

• A DBMS provides a high-level, declarative query language.^ 

The most important use of the language is the definition of queries. 


A query is a computer program that extracts data from the database that 
meet the conditions indicated in the query. 


• A DBMS supports the use of a data model. A data model is a language with 
which one can define a database structure and manipulate the data stored 
in it. 

^The word 'declarative' means that the query language allows the user to define what data 
must be extracted from fhe dafabase, buf nof how fhat should be done. If is fhe DBMS ifself fhaf 
will figure ouf how fo exfracf fhe dafa fhaf is requesfed in fhe query. Declarafive languages are 
generally considered user-friendly because fhe user need nof care abouf fhe 'how' and can focus 
on fhe 'whaf'. 
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The most prominent data model is the relational data model. We discuss it in full 
in Section 3.4.3. Its primitives are tuples (also known as records, or rows) with 
attribute values, and relations, being sets of similarly formed tuples. 


• A DBMS includes data backup and recovery functions to ensure data avail¬ 
ability at all times. 


As potentially many users rely on the availability of the data, the data must be 
safeguarded against possible calamities. Regular back-ups of the data set, and 
automatic recovery schemes provide an insurance against loss of data. 

• A DBMS allows the control of data redundancy. 


A well-designed database takes care of storing single facts only once. Storing a 
fact multiple times—a phenomenon known as data redundancy —can lead to sit¬ 
uations in which stored facts may contradict each other, causing reduced useful¬ 
ness of the data. Redundancy, however, is not necessarily always problematic, 
as long as we specify where it occurs so that it can be controlled for. 
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3.4.2 Alternatives for data management 

The decision whether or not to use a DBMS will depend, among other things, on 
how much data there is or will be, what type of use will be made of it, and how 
many users might be involved. 

On the small-scale side of the spectrum—when the data set is small, its use rela¬ 
tively simple, and with just one user—we might use simple text files, and a fexf 
processor. Think of a personal address book as an example, or a small sef of sim¬ 
ple field observafions. Texf files offer no support for dafa analysis whafsoever, 
excepf perhaps in alphabefical sorting. 

If our dafa sef is still small and numeric by nature, and we have a single type 
of use in mind, a spreadsheef program will suffice. This might be the case if we 
have a number of field observations with measurements that we want to prepare 
for sfatistical analysis, for example. However, if we carry ouf region- or nation¬ 
wide censuses, with many observation stations and/or field observers and all 
sorfs of differenf measuremenfs, one quickly needs a dafabase fo keep frack of 
all the data. It should also be noted that spreadsheets do not accommodate con¬ 
current use of the data set well, although they do support some data analysis, 
especially when it comes to calculations over a single table, like averages, sums, 
minimum and maximum values. 

All such computations are usually restricted to just a single table of dafa. When 
one wants to relate the values in the table with values of another nature in some 
other table, some expertise and significant amounts of time are usually required 
to make this happen. 
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3.4.3 The relational data model 


A data model is a language that allows the definition of: 

• The structures that will be used to store the base data, 

• The integrity constraints that the stored data has to obey at all mo¬ 
ments in time, and 

• The computer programs used to manipulate the data. 


For the relational data model, the structures used to define the database are at¬ 
tributes, tuples and relations. Computer programs either perform data extraction 
from the database without altering it, in which case we call them queries, or they 
change the database contents, and we speak of updates or transactions. The tech¬ 
nical terms surrounding database technology are defined below. 

Let us look at a tiny database example from a cadastral setting. It is illustrated in 
Figure 3.2. This database consists of three tables, one for storing people's details, 
one for storing parcel details and a third one for storing details concerning title 
deeds. Various sources of information are kept in the database such as a taxation 
identifier (Taxld) for people, a parcel identifier (PId) for parcels and the date of a 
title deed (DeedDate). 
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PrivatePerson Taxld Surname BirthDate 


1101-367 

Garcia 

10/05/1952 1 

134-788 

Chen 

26/01/1964 1 

1101-490 

Fakolo 

14/09/1931 1 


Parcel PId Location AreaSize 


13421 

2001 

435 

|8871 

1462 

550 

12109 

2323 

1040 1 

11515 

2003 

245 1 


TitleDeed Plot Owner DeedDate 


2109 101-367 18/12/1996 

8871 101-490 10/01/1984 

1515 134-788 01/09/1991 

3421 101-367 25/09/1996 


Figure 3.2: A small exam¬ 
ple database consisting of 
three relations (tables), all 
with three attributes, and 
resp. three, four and four 
tuples. PrivatePerson / 
Parcel / TitleDeed are the 
names of the three tables. 
Surname is an attribute 
of the PrivatePerson ta¬ 
ble; the Surname attribute 
value for person with Taxld 
‘101-367’ is ‘Garcia.’ 
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Relations, tuples and attributes 

In the relational data model, a database is viewed as a collection of relations, 
commonly also known as tables. 


A table or relation is itself a collection of tuples (or records). In fact, each 
table is a collection of tuples that are similarly shaped. 


By this, we mean that a tuple has a fixed number of named fields, also known 
as attributes. All tuples in the same relation have the same named fields. In a 
diagram, as in Figure 3.2, relations can be displayed as tabular form data. 


An attribute is a named field of a tuple, with which each tuple associates a 
value, the tuple’s attribute value. 


The example relations provided in the figure should clarify this. The Private- 
Person table has three tuples; the Surname attribute value for the first tuple 
illustrated is 'Garcia.' 

The phrase 'that are similarly shaped' takes this a little bit further. It requires 
that all values for the same attribute come from a single domain of values.An 
attribute's domain is a (possibly infinite) set of atomic values such as the set of 

integer number values, the set of real number values, etc. In our example cadas- Attribute domain 

tral database, the domain of the Surname attribute, for instance, is String, so any 
surname is represented as a sequence of text characters, i.e. as a string. The avail¬ 
ability of other domains depends on the DBMS, but usually integer (the whole 
numbers), real (all numbers), date, yes/no and a few more are included. 
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Table 3.5: The relation 
schemas for the three ta¬ 
bles of the database in 
Figure 3.2. 


When a relation is created, we need to indicate what type of tuples it will store. 
This means that we must 

1. Provide a name for the relation, 

2. Indicate which attributes it will have, and 

3. Set the domain of each attribute. 

A relation definition obtained in this way is known as the relation schema of that 
relation. The definition of relation schemas is an important part of database de¬ 
sign. Our example database has three relation schemas; one of them is TitleDeed. 
The relation schemas together make up the database schema. For the database of 
Figure 3.2, the relation schemas are given in Table 3.5. Underlined attributes 
(and their domains) indicate the primary key of fhe relation, which will be de¬ 
fined and discussed below. Relation schemas are sfable, and will rarely change 
over time. This is not true of the tuples stored in tables: they, typically, are of¬ 
ten changing, either because new tuples are added, others are removed, or yet 
others will see changes in their attribute values. 

The set of tuples in a relation at some point in time is called the relation instance 
at that moment. This tuple set is always finite: It is possible to count how many 
tuples there are. Figure 3.2 gives us a single database instance, i.e. one relation 


Primary key 


Relation instance 


PrivatePerson ( TaxId : string, Surname : string, Birthdate : date) 
Parcel ( Pid : number. Location : polygon, AreaSize : number) 

TitleDeed ( Plot : number. Owner : string, DeedDate : date) 
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instance for each relation. One relation instance has three tuples, two of them 
have four. Any relation instance always contains only tuples that comply with 
the relation schema of the relation. 
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Finding tuples and building links between them 

We have already stated that database systems are particularly good at storing 
large quantities of data. (Note: our example database is not even small, it is 
tiny!) The DBMS must support quick searches amongst many tuples. This is 
why the relational data model uses the notion of a key. 


A key of a relation comprises one or more attributes. A value for these 
attributes uniquely identifies a tuple. 


In other words, if we have a value for each of the key attributes we are guaran¬ 
teed to find no more than one tuple in the table with that combination of values. 
It remains possible that there is no tuple for the given combination. In our ex¬ 
ample database, the set {Taxld, Surname} is a key of the relation PrivatePerson: 
if we know both a Taxld and a Surname value, we will find at most one tuple 
with that combination of values. 

Every relation has a key, though possibly it is the combination of all attributes. 
Such a large key, however, is not handy because we must provide a value for 
each of its attributes when we search for tuples. Clearly, we want a key to have 
as few as possible attributes: the fewer, the better.^ 

If a key has just one attribute, it obviously can not have fewer attributes. Some 
keys have two attributes; an example is the key {Plot, Owner} of relation Ti- 
tleDeed. We need both attributes because there can be many title deeds for a 

^As an aside, note that an attribute such as AreaSize in relation Parcel is not a key, although it 
appears to be one in Figure 3.2. The reason is that some day there could be a second parcel with 
size 435, giving us two parcels with that value. 
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/ 

/ 

/ 

/ / 
I I 


Parcel PId Location AreaSize 


1 \ 

, -->- 

3421 

2001 

435 

\ 

8871 

1462 

550 

^ --► 

2109 

2323 

1040 


1515 

2003 

245 ] 


TitleDeed Plot Owner DeedDate 

-< 

2109 101-367 18/12/1996 

- c 

8871 101-490 10/01/1984 

1515 134-788 01/09/1991 

3421^101-367 25/09/1996 


Figure 3.3: The table 
TitleDeed has a foreign 
key in its attribute Plot. 
This attribute refers to key 
values of the Parcel rela¬ 
tion, as indicated for two 
TitleDeed tuples. The ta¬ 
ble TitleDeed actually has 
a second foreign key in 
the attribute Owner, which 
refers to PrivatePerson 
tuples. 


single plot (in case of plots that are sold often) but also many title deeds for a 
single person (in case of wealthy persons). When we provide a value for a key, 
we can look up the corresponding tuple in the table (if such a tuple exists). 

A tuple can refer to another tuple by storing that other tuple's key value. For 
instance, a TitleDeed tuple refers to a Parcel tuple by including that tuple's key 
value. The TitleDeed table has a special attribute Plot for storing such values. 

The Plot attribute is called a foreign key because it refers to the primary key (Pid) Foreign key 

of another relation (Parcel). This is illustrated in Figure 3.3. Two tuples of the 

same relation instance can have identical foreign key values: for instance, two 

TitleDeed tuples may refer to the same Parcel tuple. A foreign key, therefore, 

is not a key of the relation in which it appears, despite its name! A foreign key 

must have as many attributes as the primary key that it refers to. 
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3.4.4 Querying a relational database 

We will now look at the three most elementary query operators. These are quite 
powerful because they can be combined to define queries of higher complexity 

The three query operators have some traits in common. First, all of them require 
input and produce output, and both input and output are relations! This guar¬ 
antees that the output of one query (a relation) can be the input of another query, 
and this gives us the possibility to build more and more complex queries, if we 
want. 

The first query operator is called tuple selection; it is illustrated in Figure 3.4(a). 


Tuple selection works like a filter: it allows tuples that meet the selection 
condition to pass, and disallows tuples that do not meet the condition. 


The operator is given some input relation, as well as a selection condition about 
tuples in the input relation. A selection condition is a truth statement about a 
tuple's attribute values such as: AreaSize > 1000. For some tuples in Parcel this 
statement will be true, for others it will be false. Tuple selection on the Parcel 
relation with this condition will result in a set of Parcel tuples for which the 
condition is true. 

A second operator is also illustrated in Figure 3.4. It is called attribute projection. 
Besides an input relation, this operator requires a list of attributes, all of which 
should be attributes of the schema of the input relation. 


Attribute projection works like a tuple formatter: it passes through all tuples 
of the input, but reshapes each of them in the same way. 


previous 


next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 


Selection condition 


about 























3.4. Database management systems 


172 


The output relation of this operator has as its schema only the list of attributes 
given, and we say that the operator projects onto these attributes. Contrary to 
the first operator, which produces fewer tuples, this operator produces fewer 
attributes compared to the input relation. 

The most common way of defining queries in a relational database is through 

the SQL language. SQL stands for Structured Query Language. The two queries SQL 

of Figure 3.4 are written in this language as follows: 

SELECT * SELECT PId, Location 

FROM Parcel FROM Parcel 

WFIERE AreaSize > 1000 


(a) tuple selection from the Parcel relation, 
using the condition AreaSize > 1000. The * 
indicates that we want to extract all attributes 
of the input relation. 


(b) attribute projection from the Parcel 
relation. The SELECT-clause indicates that 
we only want to extract the two attributes PId 
and Location. There is no WHERE-clause in 
this query. 


Queries like the two above do not create stored tables in the database. This is 
why the result tables have no name: they are virtual tables. The result of a query 

is a table that is shown to the user who executed the query. Whenever the user Virtual tables 

closes her/his view on the query result, that result is lost. The SQL code for the 
query is stored, however, for future use. The user can re-execute the query again 
to obtain a view on the result once more. 

Our third query operator differs from the two above in that it requires two input 
relations. The operator is called the join, and is illustrated in Figure 3.5. 
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The join operator takes two input relations and produces one output relation, 
gluing two tuples together (one from each input relation), to form a bigger 
tuple, if they meet a specified condition. 


The output relation of this operator has as attributes those of the first and those 
of the second input relation. The number of attributes therefore increases. The 
output tuples are obtained by taking a tuple from the first input relation and 
'gluing' it to a tuple from the second input relation. The join operator uses a con¬ 
dition that expresses which tuples from the first relation are combined ('glued') 
with which tuples from the second. The example of Figure 3.5 combines Ti- 
tleDeed tuples with Parcel tuples, but only those for which the foreign key Plot 
matches with primary key PId. 

The above join query is also easily expressed in SQL as follows. 

SELECT * 

FROM TitleDeed, Parcel 
WHERE TitleDeed.Plot = Parcel.PId 

The FROM-clause identifies the two input relations; the WHERE-clause states 
the join condition. It is often not sufficient to use just one operator for extracting 
sensible information from a database. The strength of the above operators hides 

in the fact that they can be combined to produce more advanced and useful Join condition 

query definitions. We provide a final example to illustrate this. Take another 

look at the join of Figure 3.5. Suppose we really wanted to obtain combined 

TitleDeed/Parcel information, but only for parcels with a size over 1000, and we 

only wanted to see the owner identifier and deed date of such title deeds. 

We can take the result of the above join, and select the tuples that show a parcel 
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size over 1000. The result of this tuple selection can then be taken as the input for 
an attribute selection that only leaves Owner and DeedDate. This is illustrated 
in Figure 3.6. 

Finally, we may look at the SQL statement that would give us the query of Fig¬ 
ure 3.6. It can be written as 

SELECT Owner, DeedDate 
FROM TitleDeed, Parcel 

WFIERE TitleDeed.Plot = Parcel.PId AND AreaSize > 1000 
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Figure 3.4: The two unary 
query operators: (a) tu¬ 
ple selection has a sin¬ 
gle table as input and pro¬ 
duces another table with 
less tuples. Here, the 
condition was that Area- 
Size must be over 1000; 
(b) attribute projection has 
a single table as input 
and produces another ta¬ 
ble with fewer attributes. 
Here, the projection is onto 
the attributes PId and Lo¬ 
cation. 
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Figure 3.5: The essential 
binary query operator: 
join. The join condition 
for this example is Ti- 
tleDeed.Plot=Parcel.Pid, 
which expresses a foreign 
key/key link between 
TitleDeed and Parcel. 
The result relation has 
3 + 3 = 6 attributes. 
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Figure 3.6: A combined 
selection/projection/join 
query, selecting owners 
and deed dates for parcels 
with a size larger than 
1000. The join is carried 
out first, then follows a tu¬ 
ple selection on the result 
tuples of the join. Finally, 
an attribute projection is 
carried out. 
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3.5.1 Linking GIS and DBMS 

GIS software provides support for spatial data and thematic or attribute data. 

GISs have traditionally stored spatial data and attribute data separately. This 
required the GIS to provide a link between the spatial data (represented with 

rasters or vectors), and their non-spatial attribute data. The strength of GIS Storing spatial and attribute 
technology lies in its built-in 'understanding' of geographic space and all tunc- data 

tions that derive from this, for purposes such as storage, analysis, and map pro¬ 
duction. GIS packages themselves can store tabular data, however, they do not 
always provide a full-fledged query language to operate on the tables. 

DBMSs have a long tradition in handling attribute (i.e. administrative, non- 
spatial, tabular, thematic) data in a secure way, for multiple users at the same 
time. Arguably, DBMSs offer much better table functionality, since they are 
specifically designed for this purpose. A lot of the data in GIS applications is 

attribute data, so it made sense to use a DBMS for it. For this reason, many GIS External DBMS 

applications have made use of external DBMSs for data support. In this role, the 

DBMS serves as a centralized data repository for all users, while each user runs 

her/his own GIS software that obtains its data from the DBMS. This meant that 

a GIS had to link the spatial data represented with rasters or vectors, and the 

attribute data stored in an external DBMS. 

With raster representations, each raster cell stores a characteristic value. This 
value can be used to look up attribute data in an accompanying database table. 

For instance, the land use raster of Figure 3.7 indicates the land use class for each 
of its cells, while an accompanying table provides full descriptions for all classes, 
including perhaps some statistical information for each of the types. Observe the 
similarity with the key/foreign key concept in relational databases. 
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Figure 3.7: A raster rep¬ 
resenting land use and a 
related table providing full 
text descriptions (amongst 
others) of each land use 
class. 


With vector representations, our spatial objects—^whether they are points, lines 
or polygons—are automatically given a unique identifier by the system. This 
identifier is usually just called the object ID or feature ID and is used to link the 
spatial object (as represented in vectors) with its attribute data in an attribute 

table. The principle applied here is similar to that in raster settings, but in this Linking objects and tabies 

case each object has its own identifier. The ID in the vector system functions as 
a key, and any reference to an ID value in the attribute database is a foreign key 
reference to the vector system. For example, in Figure 3.8, parcel is a table with 
attributes, linked to the spatial objects stored in a GIS by the Location column. 

Obviously, several tables may make references to the vector system, but it is not 
uncommon to have some main table for which the ID is actually also the key. 
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Figure 3.8: Storage and 
linking of vector attribute 
data between a GIS and 
DBMS. 
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3.5.2 Spatial database functionality 

DBMS vendors have over the last 20 years recognized the need for storing more 
complex data, like spatial data. The main problem was that there is additional 
functionality needed by DBMS in order to process and manage spatial data. As 
the capabilities of our hardware to process information has increased, so too 
has the desire for better ways to represent and manage spatial data. During the 
1990's, object-oriented and object-relational data models were developed for jusf 
this purpose. These extend standard relational models with support for objecfs, 
including 'spafiaT objecfs. 

Currenfly, GIS software packages are able to store spatial data using a range 
of commercial and open source DBMSs such as Oracle, Informix, IBM DB2, 

Sybase, and PosfgreSQL, with the help of spafial exfensions. Some GIS software 
have integrated database 'engines', and therefore do nof need fhese exfensions. 

ESRTs ArcGIS, for example, has the main components of the MS Access data- Spatial DBMS 

base software built-in. This means that the designer of a GIS application can 
choose whefher fo sfore fhe application dafa in the GIS or in the DBMS. Spa¬ 
tial databases, also known as geodatabases 2 are implemented directly on existing 
DBMSs, using extension software to allow them to handle spatial objects. 


A spatial database allows users to store, query and manipulate collections 
of spatial data. 


There are several advantages in doing this, as we will see below. Put simply, 

^Often, the term 'geodatabase' is used to refer to a specific kind of spafial database created 
with ESRI's ArcGIS software. Here we use it to refer fo spafially-enabled DBMS in general. 
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spatial data can be stored in a special database column, known as the geome¬ 
try column, (or feature or shape, depending on the specific software package), 
as shown in Figure 3.9. This means GISs can rely fully on DBMS support for 
spatial data, making use of a DBMS for data query and storage (and multi-user 
support), and GIS for spatial functionality. Small-scale GIS applications may not 
require a multi-user capability, and can be supported by spatial data support 
from a personal database. 


Parcel 

PId 

Geometry 

OwnerlD 


3421 

"MULTIPOLYGON(((257462.|'04979333 464780.750851061,257463.89798...)))” 435 


8871 

"MULTIPOLYGON(((257409.813950544 464789.91585049,257407.896903...)))” 

550 


2109 

”MULTIPOLYGON(((257785.T14911912 464796.839972167,257782.59794...)))” 

1040 


1515 

”MULTIPOLYGON(((257790.672100448 464807.13792585,257788.608078...)))” 

245 1 


3434 

”MULTIPOLYGON(((257435.527950478 464803.92887633,257428.254887...)))” 

486 


6371 

MULTIPOLYGON(((257432.476077854 464813.848852072,257433.147910...)))” 

950 


2209 

”MULTIPOLYGON(((257444.?88027332 464826.555046319,257446.43201...)))” 

1840 


1505 

”MULTIPOLYGON(((256293.760107491 464935.203846095,256292. 00881...)))” 

145 


Figure 3.9: Geometry 

data stored directly in a 
spatial database table. 


A geodatabase allows a wide variety of users to access large data sets (both ge¬ 
ographic and alphanumeric), and the management of their relations, guarantee¬ 
ing their integrity. The Open Geospatial Gonsortium (OGG) has released a series 
of standards relating to geodatabases that (amongst other things), define: 


• Which tables must be present in a spatial database (i.e. geometry colurrms 
table and spatial reference system table) 

• The data formats, called 'Simple Features' (i.e. point, line, polygon, etc.) 

• A set of SQL-like instructions for geographic analysis. 
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The architecture of a spatial database differs from a standard RDBMS not only 
because it can handle geometry data and manage projections, but also for a 
larger set of commands that extend standard SQL language (e.g. distance cal¬ 
culations, buffers, overlay, conversion between coordinate systems, etc.). 

At the time of writing, spatial databases support the storage of image data, 
but that support is still relatively limited and under development. As with 
the hardware and software trends identified in Section 3.2.1, the capabilities 
of spatial databases will continue to evolve over time. Currently, ESRFs Ar- 
cGIS geodatabase can store topological relationships directly in the database, 
providing support for different kinds of feafures (objects) and their behaviour 
(relations with other objects), as well as ways to validate these relations and 
behaviours.Effectively, this is similar to the functionality offered by traditional 
DBMSs, but with geospatial data. 
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Querying a spatial database 

A Spatial DBMS provides support for geographic co-ordinate systems and trans¬ 
formations. It also provides storage of the relationships between features, in¬ 
cluding the creation and storage of topological relationships. As a result one is Spatial query 

able to use functions for 'spatial query' (exploring spatial relationships). To il¬ 
lustrate, a spatial query using SQL to find all the Thai restaurants within 2 km 
of a given hotel would look like this: 

SELECT R.Name 
FROM Restaurants AS R, 

Hotels as H 

WHERE R.Type = “Thai” AND 
H.name = “Hilton” AND 

ST_lntersects(R.Geometry, ST_Buffer(H.Geometry, 2000)) 


In this case the WHERE clause uses the ST.IntersectS function to perform a spa¬ 
tial join between a 2000 m buffer of the selected hotel and the selected subset of 
restaurants. The Geometry column carries the spatial data. 
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Summary 


Data management and processing functions are central to GIS. This chapter has 
attempted to provide an overview of DBMS and geodatabase technology. It has 
examined data management and processing methods and techniques for orga¬ 
nizing our spatial and attribute data using GIS and databases. 

Traditionally, GIS were is more suited for the first and DBMS better for the sec¬ 
ond purpose. As a result, GIS were often linked to external DBMS for subsfantial 
applications or projects requiring more powerful attribute data management ca¬ 
pabilities. 

Spatial databases are a marriage of GIS and traditional DBMS. They support 
storage and manipulation of both geometry and attribute data, including spa¬ 
tial queries. The functions and capabilities of spatial databases are constantly 
improving. In the near future it is likely that we will use spatial databases exclu¬ 
sively for storage of all geomefric and attribute data. 
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Questions 

1. Consider the hypothetical case that your institute or company equips you 
for field surveys with a GPS receiver, a mobile phone (global coverage) 
and a laptop. Compare that situation with one where your employer only 
gives you a notepad and pencil for field surveying. What is the gain in 
time efficiency? What sort of project can be contemplated now that was 
impossible before? 

2. Table 3.2 lists various ways of getting digital data into a CIS. From a per¬ 
spective of data accuracy and data correctness, what do you think are the 
best choices? In your field, what is the most common technique currently 
in use? Do you feel better techniques may be available? 

3. In Figure 3.2 and Table 3.5 we illustrated the structure of our example da¬ 
tabase. In what (fundamental) way does the table differ from the figure? 
Why have the attributes been grouped the way they have? (Hint: look for 
the obvious explanation.) 

4. The following is a correct SQL query on the database of Figure 3.2. Ex¬ 
plain in words what information it will produce when executed against 
that database. 

SELECT PrivatePerson.Surname, TitleDeed.Plot 
FROM PrivatePerson, TitleDeed 
WFIERE PrivatePerson.Taxld = TitleDeed.Owner AND 
PrivatePerson.BirthDate > 1/1/1960 



<^> 
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Determine what table the query will result in. If possible, draw up a dia¬ 
gram like Figure 3.5 (but without showing data values) that demonstrates 
what the query does. 
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Chapter 4 

Spatial referencing and positioning 


In the early days of GIS, users were mainly handling spatially referenced data 
from a single country This data was usually derived from paper maps pub¬ 
lished by the country's mapping organization. Nowadays, GIS users are com¬ 
bining spatial data from a given country with global spatial data sets, reconcil¬ 
ing spatial data from published maps with coordinates established with satellite 
positioning techniques and integrating their spatial data with that from neigh¬ 
bouring countries. To perform these kinds of tasks successfully, GIS users need 
to understand basic spatial referencing concepts. 

This chapter is two parts. In Section 4.1, we discuss the relevance and actual use 
of reference surfaces, coordinate systems and coordinate transformations. In 
Section 4.2 we look more closely at satellite-based positioning. The introduction 
of global positioning techniques has made it possible to unambiguously deter- 
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mine a position in space. These developments have laid the foundation for the 
integration of all spatial data within a single global 3D spatial reference system, 
which we may see emerge within the next 10-15 years. 
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4.1 Spatial referencing 


One of the defining features of GIS is their ability to combine spatially referenced 
data. A frequently occurring issue is the need to combine spatial data from dif¬ 
ferent sources that use different spatial reference systems. This section provides 
a broad background of relevant concepts relating to the nature of spatial refer¬ 
ence systems and the translation of data from one spatial referencing system into 
another. 
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4.1.1 Reference surfaces for mapping 


The surface of the Earth is anything but uniform. The oceans can be treated 
as reasonably uniform, but the surface or topography of the land masses ex¬ 
hibits large vertical variations between mountains and valleys. These variations 
make it impossible to approximate the shape of the Earth with any reasonably 

simple mathematical model. Consequently, two main reference surfaces have The Geoid and ellipsoid 

been established to approximate the shape of the Earth. One reference surface is 

called the Geoid, the other reference surface is the ellipsoid. These are illustrated 

in Eigure 4.1. Below, we look at and discuss the respective uses of each of these 

surfaces. 



Ellipsoid 


Figure 4.1: The Earth’s 
surface, and two reference 
surfaces used to approx¬ 
imate it: the Geoid, and 
a reference ellipsoid. The 
Geoid separation (N) is 
the deviation between the 
Geoid and a reference el¬ 
lipsoid. 
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The Geoid and the vertical datum 


We can simplify matters by imagining that the entire Earth's surface is covered 
by water. If we ignore tidal and current effects on this 'global ocean', the resul¬ 
tant water surface is affected only by gravity This has an effect on the shape of 

this surface because the direction of gravity-more commonly known as plumb Plumb line 

line-is dependent on the mass distribution inside the Earth. Due to irregularities 

or mass anomalies in this distribution the 'global ocean' results in an undulated 

surface. This surface is called the Geoid (Eigure 4.2). The plumb line through 

any surface point is always perpendicular to it. 


Figure 4.2: The Geoid, 
exaggerated to illustrate 
the complexity of its sur¬ 
face. 



The Geoid is used to describe heights. In order to establish the Geoid as refer¬ 
ence for heights, the ocean's water level is registered at coastal places over sev¬ 
eral years using tide gauges (mareographs). Averaging the registrations largely 
eliminates variations of the sea level with time. The resulting water level rep¬ 
resents an approximation to the Geoid and is called the mean sea level. Eor 
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the Netherlands and Germany, the local mean sea level is realized through the Mean sea level 

Amsterdam tide-gauge (zero height). We can determine the height of a point in 

Enschede with respect to the Amsterdam tide gauge using a technique known 

as geodetic levelling (Figure 4.3). The result of this process will be the height 

above local mean sea level for the Enschede point. The height determined with 

respect to a tide-gauge station is known as the orthometric height (height H above 

the Geoid). 

Obviously, there are several realizations of local mean sea levels (also called local 
vertical datums) in the world. They are parallel to the Geoid but offset by up to 
a couple of mefres. This offset is due to local phenomena such as ocean currents, 
tides, coastal winds, water temperature and salinity at the location of the tide- 
gauge. Gare must be taken when using heights from another local vertical datum 

. For example, this might be the case in the border area of adjacent nations. Even Local vertical datums 

within a country, heights may differ depending on to which tide gauge, mean 

sea level point, they are related. As an example, the mean sea level from the 

Atlantic to the Pacific coast of the USA increases by 0.6 to 0.7 m. The tide gauge 

(zero height) of the Netherlands differs -2.34 mefres from the tide gauge (zero 

height) of the neighbouring country Belgium. 

The local vertical datum is implemented through a levelling network (see Fig¬ 
ure 4.3(a)). A levelling network consists of benchmarks, whose height above 

mean sea level has been determined through geodetic levelling . The implemen- Geodetic levelling 

tation of the datum enables easy user access. The surveyors do not need to start 
from scratch (i.e. from the Amsterdam tide-gauge) every time they need to de¬ 
termine the height of a new point. They can use the benchmark of the levelling 
network that is closest to the point of interest (Figure 4.3(b)). 
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the levelling network 


(b) 


Figure 4.3: A levelling 
network implements a 
local vertical datum: 
(a) network of levelling 
lines starting from the 
Amsterdam tide-gauge, 
showing some of the 
benchmarks; (b) how the 
orthometric height (H) 
is determined for some 
point, working from the 
nearest benchmark. 


As a result of satellite gravity missions, it is currently possible to determine the 
height (H) above the Geoid with centimetre level accuracy It is foreseeable that 
a global vertical datum may become ubiquitous in the next 10-15 years. If all 
published maps are also using this global vertical datum by that time, heights 
will become globally comparable, effectively making local vertical datums re¬ 
dundant for GIS users. 
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The ellipsoid 


Above, we have defined a physical surface, the Geoid, as a reference surface for 
heights. We also need a reference surface for the description of the horizontal 
coordinatesoi points of interest. Since we will later project these horizontal coor¬ 
dinates onto a mapping plane, the reference surface for horizontal coordinates 

requires a mathematical definition and description. The most convenient geo- Oblate ellipsoid 

metric reference is the oblate ellipsoid (Figure 4.4). It provides a relatively simple 

figure which fits the Geoid to a first order approximation, though for small scale 

mapping purposes a sphere may be used. An ellipsoid is formed when an ellipse 

is rotated about its minor axis. This ellipse which defines an ellipsoid or spheroid 

is called a meridian ellipse.^ 



Equatorial 

plane 


Figure 4.4: An oblate el¬ 
lipse, defined by its semi¬ 
major axis a and semi¬ 
minor axis b. 


The shape of an ellipsoid may be defined in a number of ways, but in geode¬ 
tic practice the definition is usually by its semi-major axis and flattening (Fig- 

^Notice that ellipsoid and spheroid are used here to refer to the same thing. 
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ure 4.4). Flattening / is dependent on both the semi-major axis a and the semi¬ 
minor axis b. 

j ^ ja-b) 
a 


The ellipsoid may also be defined by its semi-major axis a and its eccentricity e, 
which is given by: 


e 


2 






Given one axis and any one of the other three parameters, the other two can be 
derived. Typical values of the parameters for an ellipsoid are: 


a = 6378135.00 m, b = 6356750.52 m, f = —^—, e = 0.08181881066 
’ ’ •' 298.26’ 


Many different ellipsoids have been defined. Local ellipsoids have been estab¬ 
lished to fit the Geoid (mean sea level) well over an area of local interest, which 
in the past was never larger than a continent. This meant that the differences 
between the Geoid and the reference ellipsoid could effectively be ignored, al¬ 
lowing accurafe maps to be drawn in the vicinity of the datum (Figure 4.5). 

With increasing demands for global surveying, work is underway to develop 
global reference ellipsoids. In contrasf to local ellipsoids, which apply only to a 


Local ellipsoids 
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Region of 
best fit 



Figure 4.5: The Geoid, a 
globally best fitting ellip¬ 
soid for it, and a region¬ 
ally best fitting ellipsoid for 
it, for a chosen region. 
Adapted from: Ordnance 
Survey of Great Britain. A 
Guide to Coordinate Sys¬ 
tems in Great Britain. 


specific country or localised area of the Earth's surface, global ellipsoids approx- Global ellipsoids 

imate the Geoid as a mean earth ellipsoid. The International Union for Geodesy 
and Geophysics (lUGG) plays a central role in establishing these reference fig¬ 
ures. 

In 1924, the general assembly of the lUGG in Madrid introduced the ellipsoid de¬ 
termined by Hayford in 1909 as the international ellipsoid. However, according 
to present knowledge, the values for this ellipsoid give an insufficient approxi¬ 
mation. At the general assembly 1967 of the lUGG in Luzern, the 1924 reference 
system was replaced by the Geodetic Reference System 1967 (GRS 1967). It rep¬ 
resents a good approximation (as of 1967) to the mean Earth figure. 

Eor some time, the Geodetic Reference System 1967 was used in the plaiming of 
new geodetic surveys. Eor example, the Australian Datum (1966) and the South 
American datum (1969) are based upon this ellipsoid. However, at its general as¬ 
sembly 1979 in Ganberra the lUGG recognized that the GRS 1967 no longer rep- 
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resented the size and shape of the Earth to an adequate accuracy. Consequently, 
it was replaced by the Geodetic Reference System 1980 (GRS80) ellipsoid. 


Name 

a(m) 

b(m) 

/ 

International (1924) 

6378388. 

6356912. 

1 : 297.000 

GRS 1967 

6378160. 

6356775. 

1 : 298.247 

GRS 1980 and WGS84 

6378137. 

6356752. 

1 : 298.257 


Table 4.1 : Three global el¬ 
lipsoids defined by a semi¬ 
major axis a, semi-minor 
axis b, and flattening /. 
The GRS80 and WGS84 
can be considered iden¬ 
tical for all practical pur¬ 
poses. 
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The local horizontal datum 

Ellipsoids have varying position and orientations. An ellipsoid is positioned and 
oriented with respect to the local mean sea level by adopting a latitude (0) and 
longitude (A) and ellipsoidal height (h) of a so-called fundamental point and an 
azimuth to an additional point. We say that this defines a local horizontal datum. 

Notice that the term horizontal datum and geodetic datum are being treated as 
equivalent and interchangeable words. 

Several hundred local horizontal datums exist in the world. The reason is ob¬ 
vious: Different local ellipsoids with varying position and orientation had to 
be adopted to best fit the local mean sea level in different countries or regions. 

An example is the Potsdam Datum, the local horizontal datum used in Germany. 

The fundamental point is in Rauenberg and the underlying ellipsoid is the Bessel 
ellipsoid (a = 6,377, 397.156 m,b = 6,356, 079.175 m). We can determine the lati¬ 
tude and longitude (0, A) of any other point in Germany with respect to this local 
horizontal datum using geodetic positioning techniques, such as triangulation 
and trilateration. The result of this process will be the geographic (or horizontal) 
coordinates (0, A) of the new point in the Potsdam Datum. 

A local horizontal datum is realized through a triangulation network. Such a 
network consists of monumented points forming a network of triangular mesh 
elements (Figure 4.6). The angles in each triangle are measured in addition to 

at least one side of a triangle; the fundamental point is also a point in the trian- Triangulation networks 
gulation network. The angle measurements and the adopted coordinates of the 
fundamental point are then used to derive geographic coordinates (0, A) for all 
monumented points of the triangulation network. 
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Within this framework, users do not need to start from scratch (i.e. from the 
fundamental point) in order to determine the geographic coordinates of a new 
point. They can use the monument of the triangulation network that is closest to 
the new point. The extension and re-measurement of the network is nowadays 
done through satellite measurements. 



Figure 4.6: The old 

primary triangulation 
network in the Nether¬ 
lands made up of 77 
points (mostly church 
towers). The extension 
and re-measurement of 
the network is nowadays 
done through satellite 
measurements. Adapted 
from original figure by 
‘Dutch Cadastre and Land 
Registers’ now called het 
Kadaster. 
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The global horizontal datum 

Local horizontal datums have been established to fit the Geoid well over the 
area of local interest, which in the past was never larger than a continent. With 
increasing demands tor global surveying activities are underway to establish 
global reference surfaces. The motivation is to make geodetic results mutually 
comparable and to provide coherent results also to other disciplines like astron¬ 
omy and geophysics. 

The most important global (geocentric) spatial reference system for the GIS com¬ 
munity is the International Terrestrial Reference System (ITRS) . It is a three- 

dimensional coordinate system with a well-defined origin (the centre of mass ITRS 

of the Earth) and three orthogonal coordinate axes {X, Y, Z). The Z-axis points 
towards a mean Earth north pole. The X-axis is oriented towards a mean Green¬ 
wich meridian and is orthogonal to the Z-axis. The X-axis completes the right- 
handed reference coordinate system (Eigure 4.7a). 



Figure 4.7: (a) The Inter¬ 
national Terrestrial Refer¬ 
ence System (ITRS), and; 
(b) the International Ter¬ 
restrial Reference Frame 
(ITRF) visualized as a dis¬ 
tributed set of ground con¬ 
trol stations (represented 
by red points). 
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The ITRS is realized through the International Terrestrial Reference Frame (ITRF), 
a distributed set of ground control stations that measure their position continu¬ 
ously using GPS (Figure 4.7b). Constant re-measuring is needed because of the 
involvement of new control stations and ongoing geophysical processes (mainly 
tectonic plate motion) that deform the Earth's crust at measurable global, re¬ 
gional and local scales. These deformations cause positional differences in time, 

and have resulted in more than one realization of the ITRS. Examples are the ITRF 

ITRF96 or the ITRF2000. The ITRF96 was established at the 1st of January, 1997. 

This means that the measurements use data up to 1996 to fix the geocentric co¬ 
ordinates {X, Y and Z in metres) and velocities (positional change in X, Y and 
Z in metres per year) at the different stations. The velocities are used to prop¬ 
agate the measurements to other epochs (times). The trend is to use the ITRF 
everywhere in the world for reasons of global compatibility. 

GPS uses the World Geodetic System 1984 (WGS84) as its reference system. It 
has been refined on several occasions and is now aligned with the ITRF to within 
a few centimetres worldwide. Global horizontal datums, such as the ITRF2000 

or WGS84, are also called geocentric datums because they are geocentrically po- Geocentric datums 

sitioned with respect to the centre of mass of the Earth. They became available 
only recently (roughly after the 1960's), with advances in extra-terrestrial posi¬ 
tioning techniques.^ 

Since the size and shape of satellite orbits is directly related to the centre of mass 
of the Earth, observations of natural or artificial satellites can be used to pinpoint 

^Extra-terrestrial positioning techniques include Satellite Laser Ranging (SLR), Lunar Laser 
Ranging (LLR), Global Positioning System (GPS), and Very Long Baseline Interferometry (VLBI), 
among others. 
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the centre of mass of the Earth, and hence the origin of the ITRS.^ This technique 
can also be used for the realization of the global ellipsoids and datums at the 
accuracy level required for large-scale mapping. 

To implemenf fhe ITRF in a region, a densification of control stations is needed 
to ensure that there are enough coordinated reference poinfs available in fhe re¬ 
gion. These control stations are equipped with permanently operating satellite 
positioning equipment (i.e. GPS receivers and auxiliary equipment) and commu¬ 
nication links. Examples for (networks consisting of) such permanent tracking 
stations are the AGRS in the Netherlands and the SAPOS in Germany. 

We can easily transform ITRP coordinafes {X, Y and Z in mefres) info geo¬ 
graphic coordinafes ((/>, A, h) with respect to the GRS80 ellipsoid without the loss 
of accuracy. However, the ellipsoidal height h, obtained through this straight¬ 
forward transformation, has no physical meaning and does nof correspond fo 

intuitive human perception of heighf. We therefore use the height H, above 3D spatial referencing 

the Geoid (see Pigure 4.8). It is foreseeable thaf global 3D spatial referencing, 
in ferms of (0, A, H), could become ubiquitous in the next 10-15 years. If all 
published maps are also globally referenced by that time, the underlying spa¬ 
tial referencing concepts will become transparent and hence redundant for GIS 
users. 

Hundreds of existing local horizontal and vertical datums are still relevant be¬ 
cause they form the basis of map producfs all over fhe world. Eor the next few 
years we will be required fo deal wifh both local and global datums until the 
former are evenfually phased ouf. During fhe transition period, we will require 
tools to transform coordinafes from local horizontal datums to a global hori- 

^In the case of an idealized spherical Earth it is one of the focal points of the elliptical orbits. 
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Figure 4.8: Height h 

above the geocentric 
eiiipsoid, and height H 
above the Geoid. The first 
is measured orthogonai to 
the eiiipsoid, the second 
orthogonai to the Geoid. 


zontal datum and vice versa (see Section 4.1.4). The organizations that usually 
develop transformation tools and make them available to the user community 
are provincial or National Mapping Organizations (NMOs) and cadastral au¬ 
thorities. 
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4.1.2 Coordinate systems 

As mentioned before, the special nature of spatial data obviously lies in it being 
spatially referenced. Different kinds of coordinate systems are used to position 
data in space. Here we distinguish between spatial and planar coordinate sys¬ 
tems. Spatial (or global) coordinate systems are used to locate data either on the 
Earth's surface in a 3D space, or on the Earth's reference surface (ellipsoid or 
sphere) in a 2D space. Below we discuss the geographic coordinate system in a 
2D and 3D space and the geocentric coordinate system, also known as the 3D 
Cartesian coordinate system. Planar coordinate systems on the other hand are 
used to locate data on the flat surface of the map in a 2D space. We will discuss 
the 2D Cartesian coordinate system and the 2D polar coordinate system. 
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2D Geographic coordinates (0, A) 

The most widely used global coordinate system consists of lines of geographic 
latitude (phi or 0 or p) and longitude (lambda or A). Lines of equal latitude are 

called parallels. They form circles on the surface of the ellipsoid"^. Lines of equal Latitude and longitude 

longitude are called meridians and they form ellipses (meridian ellipses) on the 
ellipsoid. (Figure 4.9) 


N 



Figure 4.9: The latitude 
(d) and longitude (A) an¬ 
gles represent the 2D ge¬ 
ographic coordinate sys¬ 
tem. 


The latitude (0) of a point P (Figure 4.10) is the angle between the ellipsoidal 


■^The concept of geographic coordinates can also be applied to a sphere. 
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normal through P' and the equatorial plane. Latitude is zero on the equator 
(0 = 0°), and increases towards the two poles to maximum values of 0 = +90° 
{N 90°) at the North Pole and 0 = -90° (S' 90°) at the South Pole. 

The longitude (A) is the angle between the meridian ellipse which passes through 
Greenwich and the meridian ellipse containing the point in question. It is mea¬ 
sured in the equatorial plane from the meridian of Greenwich (A = 0°) either 
eastwards through A = + 180° {E 180°) or westwards through A = -180° {W 180°). 

Latitude and longitude represent the geographic coordinates (0, A) of a poinf 
P' (Figure 4.10) with respect to the selected reference surface. They are always 
given in angular unifs. For example, the coordinates for Gity hall in Enschede 
are:^ 


8 = 52°13'26.2"N, A = 6°53'32.l"E 


The graticule on a map represents the projected position of the geographic co¬ 
ordinates (0, A) at constant intervals, or in other words the projected position 
of selected meridians and parallels (Figure 4.13). The shape of fhe graticule de¬ 
pends largely on the characteristics of the map projection and the scale of fhe 
map. 


®This latitude and longitude refers to the Amersfoort datum. The use of a differenf reference 
surface will resulf in a differenf lafifude and longifude angle. 
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3D Geographic coordinates (0, A, h) 

3D geographic coordinates (0, A, h) are obtained by introducing the ellipsoidal 
height h to the system. The ellipsoidal height (h) of a point is the vertical distance 
of the point in question above the ellipsoid. It is measured in distance units along 
the ellipsoidal normal from the point to the ellipsoid surface. 3D geographic 
coordinates can be used to define a position on the surface of the Earth (point P 
in Figure 4.10). 


Figure 4.10: The latitude 
(d) and longitude (A) an¬ 
gles and the ellipsoidal 
height (h) represent the 
3D geographic coordinate 
system. 
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3D Geocentric coordinates {X, Y, Z) 

An alternative method of defining a 3D position on the surface of the Earth is 
by means of geocentric coordinates {X, Y, Z), also known as 3D Cartesian coordi¬ 
nates. The system has its origin at the mass-centre of the Earth with the X and 
Y axes in the plane of the equator. The X-axis passes through the meridian of 
Greenwich, and the Z-axis coincides with the Earth's axis of rotation. The three 
axes are mutually orthogonal and form a right-handed system. Geocentric coor¬ 
dinates can be used to define a position on the surface of the Earth (point P in 
Eigure 4.11). 

It should be noted that the rotational axis of the earth changes its position over 
time (referred to as polar motion). To compensate for this, the mean position of 
the pole in the year 1903 (based on observations between 1900 and 1905) has 
been used to define the so-called 'Conventional International Origin' (CIO). 
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► 

Y 

Figure 4.11: An illustra¬ 
tion of the 3D geocentric 
coordinate system (see 
text for further explana¬ 
tion). 


glossary 


web links 


bibliography 


about 




























4.1. Spatial referencing 


212 


2D Cartesian coordinates {X, Y) 


A flat map has only two dimensions: width (left to right) and length (bottom 
to top). Transforming the three dimensional Earth into a two-dimensional map 
is subject of map projections and coordinate transformations (Section 4.1.3 and 
Section 4.1.4). Here, like in several other cartographic applications, two-dimensio¬ 
nal Cartesian coordinates (x, y), also known as planar rectangular coordinates, are 
used to describe the location of any point unambiguously. 

The 2D Cartesian coordinate system is a system of intersecting perpendicular 
lines, which contains two principal axes, called the X- and F-axis. The hori¬ 
zontal axis is usually referred to as the X-axis and the vertical the F-axis (Note 

that the X-axis is also sometimes called Easting and the F-axis the Northing). Eastings, Northings and 
The intersection of the X and F-axis forms the origin. The plane is marked at i^iap grid 

intervals by equally spaced coordinate lines, called the map grid . 



P (244,249) 

. O 


origin 


X 


Figure 4.12: An illustra¬ 
tion of the 2D Cartesian 
coordinate system (see 
text for further explana¬ 
tion). 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 


























4.1. Spatial referencing 


213 


Given two numerical coordinates x and y for point P, one can now precisely and 
objectively specify any location P on the map (Figure 4.12). 

Normally, the coordinates x=0 and y=0 are given to the origin. However, some¬ 
times large positive values are added to the origin coordinates.This is to avoid 
negative values for the x and y coordinates in case the origin of the coordinate 
system is located inside the area of interest. The point which then has the coor¬ 
dinates x=0 and y=0 is called the false origin. 

An example is the coordinate system used in the Netherlands. It is called Rijks- 
driehoekstelsel (RD). The system is based on the azimuthal stereographic projec¬ 
tion (see Section 4.1.3) and the Bessel ellipsoid is used as reference surface. The 
origin of the coordinate system has been shifted (false origin) from the projec¬ 
tion centre (Amersfoort) towards the south-west to avoid negative coordinates 
inside the country (see Figure 4.13). 

The grid on a map represents lines having constant 2D Cartesian coordinates 
(Figure 4.13). It is almost always a rectangular system and is used on large and 
medium scale maps to enable detailed calculations and positioning. The map 
grid is usually not used on small scale maps (about one to a million or smaller). 
Scale distortions that result from fransforming the Earth's curved surface fo the 
map plane are so great on small-scale maps that detailed calculations and posi¬ 
tioning are difficult. 
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Figure 4.13: The co¬ 
ordinate system of the 
Netherlands represented 
by the map grid and the 
graticule. The origin of 
the coordinate system has 
been shifted (false origin) 
from the projection centre 
(Amersfoort) towards the 
South-West. 
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2D Polar coordinates {a, d) 


Another possibility of defining a point in a plane is by polar coordinates. This 
is the distance d from the origin to the point concerned and the angle a between 

a fixed (or zero) direction and the direction to the point. The angle a is called Bearing or azimuth 

azimuth or bearing and is measured in a clockwise direction. It is given in angular 
units while the distance d is expressed in length units. 



Figure 4.14: An illustra¬ 
tion of the 2D Polar coor¬ 
dinate system (see text for 
further explanation). 


Bearings are always related to a fixed direction (initial bearing) or a datum line. 
In principle, this reference line can be chosen freely. However, in practice three 
different directions are widely used: True North, Grid North and Magnetic North. 
The corresponding bearings are called: true (or geodetic) bearing, grid bearing 
and magnetic (or compass) bearing. 

Polar coordinates are often used in land surveying. For some types of survey¬ 
ing instruments it is advantageous to make use of this coordinate system. The 
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development of precise remote distance measurement techniques has led to the 
virtually universal preference for the polar coordinate method in detailed sur¬ 
veys. 
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4.1.3 Map projections 

Maps are one of the world's oldest types of document. For quite some time 
it was thought that our planet was flat, and during those days, a map simply 
was a miniature representation of a part of the world. Now that we know that 
the Earth's surface is curved in a specific way, we know that a map is in fact a 
flattened representation of some part of the planet. The field of map projections 
concerns itself with the ways of translating the curved surface of the Earth into 
a flat map. 


A map projection is a mathematically described technique of how to repre¬ 
sent the Earth’s curved surface on a flat map. 


To represent parts of the surface of the Earth on a flat paper map or on a com¬ 
puter screen, the curved horizontal reference surface must be mapped onto the 
2D mapping plane. The reference surface tor large-scale mapping is usually an 
oblate ellipsoid, and for small-scale mapping, a sphere.^ Mapping onto a 2D 
mapping plane means transforming each point on the reference surface with ge¬ 
ographic coordinates (0, A) to a set of Cartesian coordinates (x, y) representing 
positions on the map plane (Figure 4.15). 

The actual mapping caimot usually be visualized as a true geometric projection, 
directly onto the mapping plane (Figure 4.15). This is achieved through map¬ 
ping equations. A forward mapping equation transforms the geographic coordi- Mapping equations 

^In practice, maps at scale 1:1,000,000 or smaller can use the mathematically simpler sphere 
without the risk of large distortions. At larger scales, the more complicated mathematics of 
ellipsoids are needed fo prevenf fhese disforfions in fhe map. 
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Figure 4.15; Example of 
a map projection where 
the reference surface with 
geographic coordinates 
{(f>, A) is projected onto the 
2D mapping plane with 
2D Cartesian coordinates 


nates (0, A) of a point on the curved reference surface to a set of planar Cartesian 
coordinates (x, y), representing the position of the same point on the map plane: 


{x,y) = /(</>, A) 

The corresponding inverse mapping equation transforms mathematically the pla¬ 
nar Cartesian coordinates (x, y) of a point on the map plane to a set of geographic 
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coordinates (0, A) on the curved reference surface: 

(0, A) = f{x,y) 

An example is the mapping equations used for the Mercator projection (spheri¬ 
cal assumption) [51]. The forward mapping equation for the Mercator projection 
is:^ 

X = R{X — Ao) 
y = R{ln{tan{^ + ^))) 

The inverse mapping equation for the Mercator projection is: 

'IT —y 

(f) = — —2 arctan(e^) 

A-- + A„ 


^The equations are considerably more complicated than those introduced here when an ellip¬ 
soid is used as reference surface. R is fhe radius of fhe spherical reference surface af fhe scale of 
fhe map; (p and A are given in radians; Aq is fhe cenfral meridian of fhe projecfion; e = 2.7182818, 
fhe base of fhe nafural logarifhms, nof fhe eccenfricify. 
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Classification of map projections 

Hundreds of map projections have been developed, each with its own specific 
qualities. These qualities in turn make resulting maps useful for cerfain pur¬ 
poses. By definition, any map projection is associated with scale distortions. 

There is simply no way to flatten out a piece of ellipsoidal or spherical surface Scale distortions 

withouf sfrefching some parfs of the surface more than others. The amount and 
which kind of disforfions a map will have depends on the type of fhe map pro- 
jecfion fhaf has been selecfed. 

Some map projections can be visualized as true geometric projections directly 

onto the mapping plane, in which case we call it an azimuthal projection, or Intermediate surfaces 

onto an intermediate surface, which is then rolled out into the mapping plane. 

Typical choices for such intermediafe surfaces are cones and cylinders. Such 
map projections are then called conical, and cylindrical, respectively. Figure 4.16 
shows the surfaces involved in these three classes of projections. 

The planar, conical, and cylindrical surfaces in Figure 4.16 are all tangent sur¬ 
faces; they touch the horizontal reference surface in one poinf (plane) or along a 
closed line (cone and cylinder) only. Another class of projecfions is obfained if 
the surfaces are chosen fo be secant fo (fo infersecf with) the horizontal reference 
surface; illusfrations are in Figure 4.17. Then, the reference surface is infersecfed 
along one closed line (plane) or two closed lines (cone and cylinder). Secant 
map surfaces are used fo reduce or average ouf scale errors because the line(s) 
of intersecfion are nof distorfed on the map. 

In the geometric depiction of map projections in Figures 4.16 and 4.17, the sym¬ 
metry axes of fhe plane, cone and cylinder coincide with the rotation axis of fhe 
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Cylindrical 


Conical 


Azimuthal 


Figure 4.16: Classes of 
map projections 



Figure 4.17: Three secant 
projection classes 
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Figure 4.18; A transverse 
and an oblique projection 


ellipsoid or sphere, i.e. a line through N and S pole. In this case, the projection 

is said to be a normal projection. The other cases are transverse projections (sym- Normal, transverse, and 
metry axis in the equator) and oblique projections (symmetry axis is somewhere oblique projections 

between the rotation axis and equator of the ellipsoid or sphere). These cases are 
illustrated in Figure 4.18. 

The Universal Transverse Mercator (UTM) uses a transverse cylinder, secant to 
the horizontal reference surface. UTM is an important projection used world¬ 
wide. The projection is a derivation from the Transverse Mercator projection 

(also known as Gauss-Kruger or Gauss conformal projection). The UTM divides UTM 

the world into 60 narrow longitudinal zones of 6 degrees, numbered from 1 to 
60. The narrow zones of 6 degrees (and the secant map surface) make the distor¬ 
tions small enough for large scale topographic mapping. 

Normal cylindrical projections are typically used to map the world in its entirety. 

Gonical projections are often used to map the different continents, while the nor¬ 
mal azimuthal projection may be used to map the polar areas. Transverse and 
oblique aspects of many projections can be used for most parts of the world. 
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It is also of importance to consider the shape of the area to be mapped. Ideally, 
the general shape of the mapping area should match with the distortion pattern 
of a specific projection. If an area is approximately circular it is possible to cre¬ 
ate a map that minimises distortion for that area on the basis of an azimuthal 
projection. The cylindrical projection is best for a rectangular area and a conic 
projection for a triangular area. 

So far, we have not specified how the curved horizontal reference surface is pro¬ 
jected onto the plane, cone or cylinder. How this is done determines which kind 
of distortions the map will have compared to the original curved reference sur¬ 
face. The distortion properties of a map are typically classified according to what 
is not distorted on the map: 

• In a conformal map projection the angles between lines in the map are iden¬ 
tical to the angles between the original lines on the curved reference sur¬ 
face. This means that angles (with short sides) and shapes (of small areas) 
are shown correctly on the map. 

• In an equal-area (equivalent) map projection the areas in the map are - 

identical to the areas on the curved reference surface (taking into account Distortion properties 

the map scale), which means that areas are represented correctly on the 

map. 

• In an equidistant map projection the length of particular lines in the map are 
the same as the length of the original lines on the curved reference surface 
(faking info account the map scale). 

A particular map projection can have any one of these three properties. No map 
projection can be both conformal and equal-area, for example. 
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The most appropriate type of distortion property for a map depends largely on 
the purpose for which it will be used. Conformal map projections represent an¬ 
gles correctly, but as the region becomes larger, they show considerable area dis¬ 
tortions (Figure 4.19). Maps used for the measurement of angles (e.g. aeronau¬ 
tical charts, topographic maps) often make use of a conformal map projection 
such as the UTM projection. 


Figure 4.19: The Merca¬ 
tor projection, a cylindrical 
map projection with a con¬ 
formal property. The area 
distortions are significant 
towards the polar regions. 


Equal-area projections on the other hand, represent areas correctly, but as the 
region becomes larger, it shows considerable distortions of angles and conse¬ 
quently shapes (Figure 4.20). Maps which are to be used for measuring area 
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(e.g. distribution maps) often make use of an equal-area map projection. 


Figure 4.20: The cylindri¬ 
cal equal-area projection, 
a cylindrical map projec¬ 
tion with an equal-area 
property. The shape dis¬ 
tortions are significant to¬ 
wards the polar regions. 


The equidistant property is achievable only to a limited degree. That is, true 
distances can be shown only from one or two points to any other point on the 
map or in certain directions. If a map is true to scale along the meridians (i.e. 
no distortion in North-South direction) we say that the map is equidistant along 
the meridians (e.g. the equidistant cylindrical projection) (Figure 4.21). If a map 
is true to scale along all parallels we say the map is equidistant along the parallels 
(i.e. no distortion in East-West direction). Maps which require reasonable area 
and angle distortions (several thematic maps) often make use of an equidisfanf 
map projection. 

Based on these discussions, a particular map projection can be classified. An ex¬ 
ample would be the classification 'conformal conic projection with two standard 
parallels' having the meaning that the projection is a conformal map projection, 
that the intermediate surface is a cone, and that the cone intersects the ellipsoid 
(or sphere) along two parallels; i.e. the cone is secant and the cone's symme¬ 
try axis is parallel to the rotation axis. (This would amount to the projection of 
Figure 4.17, middle.) 

Often, a particular type of map projection is also named after its inventor (or 
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Figure 4.21: The equidis¬ 
tant cylindrical projection 
(also called Plate Carree 
projection), a cylindrical 
map projection with an 
equidistant property. The 
map is equidistant (true 
to scale) along the merid¬ 
ians. Both shape and area 
are reasonably well pre¬ 
served. 


first publisher). For example, the 'conformal conic projection with two standard 
parallels' is also referred to as 'Lambert's conical projection' [24]. 
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4.1.4 Coordinate transformations 

Map and GIS users are mostly confronted in their work with transformations 
from one fwo-dimensional coordinafe sysfem to another. This includes the trans¬ 
formation of polar coordinafes delivered by the surveyor into Cartesian map 
coordinates or the transformation from one 2D Carfesian (x, y) system of a spe¬ 
cific map projection info another 2D Cartesian {x',y') system of a defined map 
projection. 

Datum transformations are transformations from a 3D coordinafe sysfem (i.e. 
horizonfal dafum) info another 3D coordinate system. These kinds of transfor¬ 
mations are also important for map and GIS users. They are usually collecting 
spatial data in the field using satellife navigation technology and need to repre¬ 
sent this data on published map on a local horizontal datum. 

We may relate an unknown coordinate system to a known coordinate system 
on the basis of a sef of selecfed points whose coordinates are known in both 
systems. These points may be ground control points (GCPs) or common points 
such as comers of houses or road infersecfions, as long as they have known 
coordinates in both systems. Image and scaimed data are usually transformed 
by fhis method. The transformations may be conformal, affine, polynomial, or 
of another type, depending on the geometric errors in the data set. These type of 
2D Cartesian transformations are not covered in this textbook, but are discussed 
in Principles of Remote Sensing [53]. 
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2D Polar to 2D Cartesian transformations 

The transformation of polar coordinates (a, d), into Cartesian map coordinates 
{x, y) is done when field measurements, angular and distance measurements are 
transformed into map coordinates. The equation for this transformation is: 


X = d{sin{a)) 


y = d{cos{a)) 


The inverse equation is: 

a = tan~^(—) 

y 

2,2 
d = X + y 

A more realistic case makes use of a translation and a rotation to transform one 
system to the other. 
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Changing map projection 

Forward and inverse mapping equations are normally used to transform data 
from one map projection to another. The inverse equation of the source projec¬ 
tion is used first to transform source projection coordinates (x, y) to geographic 
coordinates (0, A). Next, the forward equation of the target projection is used 
to transform the geographic coordinates ((/>, A) into target projection coordinates 
(x', y'). The first equation takes us from a projection A into geographic coordi¬ 
nates. The second takes us from geographic coordinates (0, A) to another map 
projection B. These principles are illustrated in Figure 4.22. 

Historically, a GIS has handled data referenced spatially with respect to the (x, y) 
coordinates of a specific map projection. For GIS application domains requiring 
3D spatial referencing, a height coordinate may be added to the (x, y) coordinate 
of the point. The additional height coordinate can be a height H above mean sea 
level, which is a height with a physical meaning. These (x, y, H) coordinates can 
be used to represent the location of objects in a 3D GIS. 
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Figure 4.22; The principle 
of changing from one map 
projection into another. 
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Datum transformations 

A change of map projection may also include a change of the horizontal datum. 
This is the case when the source projection is based upon a different horizon¬ 
tal datum than the target projection. If the difference in horizontal datums is 
ignored, there will not be a perfect match between adjacent maps of neighbour¬ 
ing countries or between overlaid maps originating from different projections. 
It may result in up to several hundred metres difference in the resulting coordi¬ 
nates. Therefore, spatial data with different underlying horizontal datums may 
need a so-called datum transformation. 

Suppose we wish to transform spatial data from the UTM projection to the Dutch 
RD system, and that the data in the UTM system are related to the European 
Datum 1950 (ED50), while the Dutch RD system is based on the Amersfoort 
datum. In this example the change of map projection should be combined with a 
datum transformation step for a perfect match. This is illustrated in Eigure 4.23. 

The inverse equation of projection A is used first to take us from the map coor¬ 
dinates (x, y) of projection A to the geographic coordinates (0, A, h) in datum A. 
A height coordinate {h or H) may be added to the (x, y) map coordinates. Next, 
the datum transformation takes us from these coordinates to the geographic co¬ 
ordinates (0, A, h) in datum B. Einally, the forward equation of projection B is 
used to take us from the geographic coordinates (0, A, h) in datum B to the map 
coordinates (x', y') of projection B. 

Mathematically a datum transformation is feasible via the geocentric coordi¬ 
nates (x, y, z), or directly by relating the geographic coordinates of both datum 
systems. The latter relates the ellipsoidal latitude (0) and longitude (A), and 
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Figure 4.23: The principle 
of changing from one pro¬ 
jection into another, com¬ 
bined with a datum trans¬ 
formation from datum A to 
datum B. 
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possibly also the ellipsoidal height {h), of both datum systems [28]. 

We can easily transform geographic coordinates (0, A, h) into geocentric coor¬ 
dinates {x,y, z), and the other way around. The datum transformation via the 
geocentric coordinates implies a 3D similarity transformation. Essentially, this is 
a transformation between two orthogonal 3D Cartesian spatial reference frames 
together with some elementary tools from adjusfment theory. The transforma¬ 
tion is usually expressed with seven parameters: three rotation angles (a, /?, 7 ), 
three origin shifts (Xq, Tq, Zq) and one scale factor (s). The input in the process 
are coordinates of poinfs in datum A and coordinates of the same points in da¬ 
tum B. The output is an estimate of the seven transformation parameters and a 
measure of the likely error of the estimate. 

Datum transformation parameters have to be estimated on the basis of a set of 
selected points whose coordinates are known in both datum systems. If the co¬ 
ordinates of these 5 points are not correct-often the case for points measured on 
a local datum system-the estimated parameters may be inaccurate. As a result 

the datum transformation will be inaccurate. This is often the case when we Datum transformation 

transform coordinates from a local horizontal datum to a global geocentric da- parameters 

turn. The coordinates in the local horizontal datum may be distorted by several 
tens of mefres because of the inherent inaccuracies of the measurements used in 
the triangulation network. These inherent inaccuracies are also responsible for 
another complication: the transformation parameters are not unique. Their esti¬ 
mate will depend on which particular common points are chosen, and they also 
will depend on whether all seven transformation parameters, or only a sub-set 
of them, are estimated. 

Here is an illustration of what we may expect. The example below is concerned 
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Parameter 

National set 

Provincial set 

NIMA set 

scale 

s 

1 - 8.3 ■ 10“*^ 

1 - 9.2 ■ 10“^ 

1 

angles 

a 

+1.04" 

+0.32" 




+0.35" 

+3.18" 



7 

1 

o 

00 

-0.91" 


shifts 

Xo 

-581.99 m 

-518.19 m 

-635 m 


Yo 

-105.01 m 

-43.58 m 

-27 m 


Zo 

-414.00 m 

-466.14 m 

-450 m 


Table 4.2: Three different 
sets of datum transforma¬ 
tion parameters from three 
different organizations for 
transforming a point from 
ITRF to the Potsdam da¬ 
tum. 


with the transformation of the Cartesian coordinates of a point in the state of Ba- 
den-Wurttemberg, Germany, from ITRF to Cartesian coordinates in the Potsdam 
Datum. Sets of numerical values for the transformation parameters are available 
from three organizations: 


1. The set provided by the federal mapping organization of Germany (la¬ 
belled 'National set' in Table 4.2) was calculated using common points dis¬ 
tributed throughout Germany. This set contains all seven parameters and 
is valid for all of Germany. 

2. The set provided by the mapping organization of Baden-Wiirttemberg (la¬ 
belled 'Provincial set' in Table 4.2) has been calculated using common points 
distributed throughout the province of Baden-Wiirttemberg. This set con¬ 
tains all seven parameters and is valid only within the borders of that 
province. 

3. The set provided by the National Imagery and Mapping Agency (NIMA) 
of the USA (labelled 'NIMA set' in Table 4.2) has been calculated using 
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common points distributed throughout Germany and based on the ITRE 
This set contains a coordinate shift only (no rotations, and scale equals 
unity). It is valid for all of Germany 


The three sets of transformation parameters vary by several tens of metres, for 
the aforementioned reasons. These sets of transformation parameters have been 
used to transform the ITRE cartesian coordinates of a point in the state of Baden- 
Wiirttemberg. The ITRE (X, Y, Z) coordinafes are: 

(4,156,939.96 m, 671,428.74 m, 4,774,958.21 m). 


The three sets of transformed coordinates in the Potsdam datum are given in 
Table 4.3. It is obvious that the three sets of transformed coordinates agree at the 
level of a few mefres. In a different country, the agreement could be at the level 
of cenfimefres, or tens of metres and this depends primarily on the quality of 
implementafion of the local horizontal datum. It is advisable that GIS users act 
with caution when dealing with datum transformations and that they consult 
with their national mapping organization, wherever appropriate. 


Potsdam coordinates 

National set 

Provincial set 

NIMA set 

X 

4,156,305.32 m 

4,156,306.94 m 

4,156,304.96 m 

Y 

671,404.31 m 

671,404.64 m 

671,401.74 m 

Z 

4, 774,508.25 m 

4,774,511.10 m 

4,774,508.21 m 


Table 4.3: Three sets of 
transformed coordinates 
for a point in the state of 
Baden-Wurttemberg. 
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4.2 Satellite-based positioning 


The previous section has noted the importance of satellites in spatial referencing. 
Sattelites have allowed us to realize geocentric reference systems, and increase 
the level of spatial accuracy substantially. They are critical tools in geodetic en¬ 
gineering for the maintenance of the ITRF. They also play a key role in mapping, 
surveying, and in a growing number of applications requiring positioning tech¬ 
niques. Nowadays, for fieldwork that includes spatial data acquisition, the use 
of satellite-based positioning is considered indispensable. 

Satellite-based positioning was developed and implemented to address military 
needs, somewhat analogously to the early development of the internet. The 
technology is now widely available for civilians use. The requirements for the 
development of the positioning system were: 

• Suitability for all kinds of military use: ground troops and vehicles, aircraft 
and missiles, ships; 

• Requiring only low-cost equipment with low energy consumption at the 
receiver end; 

• Provision of results in real time for an unlimited number of users concur¬ 
rently; 

• Support for different levels of accuracy (military versus civilian); 

• Around-the-clock and weather-proof availability; 

• Use of a single geodetic datum; 
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• Protection against intentional and unintentional disturbance, for instance, 
through a design allowing for redundancy. 

A satellite-based positioning system set-up involves implementation of three 
hardware segments: 

1. The space segment, i.e. the satellites that orbit the Earth, and the radio sig¬ 
nals that they emit, 

2. The control segment, i.e. the ground stations that monitor and maintain the 
space segment components, and 

3. The user segment, i.e. the users with their hard- and software to conduct 
positioning. 


In satellite positioning, the central problem is to determine values (X, Y, Z) of 
a receiver that receives satellite signals, i.e. to determine the position of the re¬ 
ceiver with a stated accuracy and precision. Required accuracy and precision 
depends on the application; timeliness, i.e. are the position values required in 
real time or can they be determined later during post-processing, also varies be¬ 
tween applications. Finally, some applications like navigation require kinematic 
approaches, which take into account the fact that the receiver is not stationary, 
but is moving. 

In the remainder of this section, we discuss some of the fundamentals of satellite- 
based positioning, having in mind especially the geoscientist that wants to make 
use of it. 
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4.2.1 Absolute positioning 

The working principles of absolute, satellite-based positioning are fairly simple: 

1. A satellite, equipped with a clock, at a specific moment sends a radio mes¬ 
sage that includes 

(a) the satellite identifier, 

(b) its position in orbit, and 

(c) its clock reading. 

2. A receiver on or above the planet, also equipped with a clock, receives the 
message slightly later, and reads its own clock. 

3. From the time delay observed between the two clock readings, and know¬ 
ing the speed of radio transmission through the medium between (satel¬ 
lite) sender and receiver, the receiver can compute the distance to the sender, 
also known as the satellite's pseudorange. 


The pseudorange of a satellite with respect to a receiver, is its apparent 
distance to the receiver, computed from the time delay with which its radio 
signal is received. 


Such a computation determines the position of the receiver to be on a sphere 
of radius equal to the computed pseudorange (refer to Figure 4.24(a)). If the 
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receiver instantaneously would do the same with a message of another satel¬ 
lite that is positioned elsewhere, the position of the receiver is restricted to an¬ 
other sphere. The intersection of the two spheres, which have different cen¬ 
tres, determines a circle as the set of possible positions of the receiver (refer to 

Figure 4.24(b)). If a third satellite message is taken into consideration, the inter- Trilateration 

section of three spheres determines at most two positions, one of which is the 
actual position of the receiver. In most, if not all, practical situations where two 
positions result, one of them is a highly unlikely position for a signal receiver. 

The overall procedure is known as trilateration: the determination of a position 
based on three distances. 



Figure 4.24: 

Pseudorange positioning: 
(a) With just one satellite 
the position is determined 
by a sphere, (b) With two 
satellites, it is determined 
by the intersection of two 
spheres, a circle. Not 
shown: with three satel¬ 
lites, it is the intersection 
of three spheres. 


It would appear therefore that the signals of three satellites would suffice to de¬ 
termine a positional fix for our receiver. In theory this is true, but in practice it is 
not. The reason is that we have made the assumption that all satellite clocks as 
well as our receiver clock are fully synchronized, where in fact they are not. The 

satellite clocks are costly, high-precision, atomic clocks that we can consider syn- Clock bias 

chronized for the time being, but the receiver typically has a far cheaper, quartz 
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clock that is not synchronized with the satellite clocks. This brings into play an 
additional unknown parameter, namely the synchronization bias of the receiver 
clock, i.e. the difference in time reading between it and the satellite clocks. 

Our set of unknown variables has now become {X, Y, Z, At) representing a 3D 
position and a clock bias. By including the information obtained from a fourth 

satellite message, we can solve the problem (see Figure 4.25). This will result 3D positioning 

in the determination of the receiver's actual position (X, Y, Z), as well as its re¬ 
ceiver clock bias At, and if we correcf the receiver clock for this bias we effec¬ 
tively turn it into a high-precision, atomic clock as well! 

Obtaining a high-precision clock is a fortunate side-effect of using the receiver, as 
it allows the design of experiments distributed in geographic space that demand 
high levels of synchrony. One such application is the use of wireless sensor net¬ 
works for various natural phenomena like earthquakes, meteorological patterns 
or in water management. 

Another application is in the positioning of mobile phone users making an emer¬ 
gency call. Offen the caller does not know their location accurately. The tele¬ 
phone company can trace back the call to the receiving transmitter mast, but 
this may be servicing an area with a radius of 300 m to 6 km. That is too inac¬ 
curate a position for an emergency ambulance to go to. However, if all masts 
in the telephony network are equipped with a satellite positioning receiver (and 
thus, with a very good, synchronized clock) the time of reception of the call at 
each mast can be recorded. The time difference of arrival of the call between two 
nearby masts determines a hyperbola on the ground of possible positions of the 
caller; if the call is received on three masts, we would have two hyperbolas, al¬ 
lowing intersection, and thus 'hyperbolic positioning'. With current technology 
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the (horizontal) accuracy would be better than 30 m. 

Returning to the subject of satellite-based positioning, when only three and not 
four satellites are 'in view', the receiver is capable of falling back from the above 

3D positioning mode to the inferior 2D positioning mode. With the relative abun- 2D positioning mode 

dance of satellites in orbit around the earth, this is a relatively rare situation, but 
it serves to illustrate the importance of 3D positioning. 

If a 3D fix has already been obtained, the receiver simply assumes that the height 
above the ellipsoid has not changed since the last 3D fix. If no fix had yet been 
obtained, the receiver assumes that it is positioned at the geocentric ellipsoid 
adopted by the positioning system, i.e. at height h=0.^ In the receiver compu¬ 
tations, the ellipsoid fills the slot of the missing fourth satellite sphere, and the 
unknown variables can therefore still be determined. Clearly in both of these 
cases, the assumption for this computation is flawed and the positioning results 
in 2D mode will be unreliable—much more so if no previous fix had been ob¬ 
tained and one's receiver is not at all near the surface of the geocentric ellipsoid. 


®Any receiver is capable of transforming a triad {X, Y, Z), using a straightforward mathemat¬ 
ical transformation, into an equivalent triad (3, A, h), where h is the height above the geocentric 
ellipsoid. 
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Figure 4.25: Four satel¬ 
lites are needed to ob¬ 
tain a 3D position fix. 
Pseudoranges are indi¬ 
cated for each satellite as 
dashed circles represent¬ 
ing a sphere, as well as 
the actual range as a nor¬ 
mal circle, being the pseu¬ 
dorange plus the range 
error caused by receiver 
clock bias. 
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Time, clocks and world time 

During most of human history, the determination of time and position have gone 
hand in hand. This was probably true of many civilizations in Asia and Arabia 
before the Christian calendar as witnessed by remnants of various time keeping 
constructions, as well as tor early civilizations in Latin America; it was certainly 
true for the European seafarer explorers of the 15th through to the 18th century. 

While latitude was determined with a sextant from the position of the Sun in the 
sky, they carried clocks with them to determine the longitude of their position. 

Early ship clocks were notoriously unreliable, having a drift of multiple seconds 
a day, which could result in positional error of a few kilometres. 

Before any notion of standard time existed, villages and cities simply kept track 
of their local time determined from position of the Sun in the sky. When trains 
became an important means of transportation, these local time systems became 
problematic as the schedules required a single time system. Such a time system 

needed the definition of time zones: typically as 24 geographic strips between Greenwich mean time 

certain longitudes that are multiples of 15°. This all gave rise to Greenwich Mean 
Time (GMT). GMT was the world time standard of choice. It was a system based 
on the mean solar time at the meridian of Greenwich, United Kingdom, which 
is the conventional 0-meridian in geography. 

GMT was later replaced by Universal Time (UT), a system still based on merid¬ 
ian crossings of stars, but now of far away quasars as this provides more accu¬ 
racy than that of the Sun. It is still the case that the rotational velocity of our 
planet is not constant and the length of a solar day is increasing. So UT is not 
a perfect system either. It continues to be used for civil clock time, but it is offi¬ 
cially now replaced by International Atomic Time (TAl). UT actually has various 
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versions, amongst which are UTO, UTl and UTC. UTO is the Earth rotational time 
observed in some location. Because the Earth experiences polar motion as well, 

UTO differs between locations. If we correct for polar motion, we obtain UTl, 
which is identical everywhere. It is still a somewhat erratic clock because of the 
earlier mentioned varying rotational velocity of the planet. The uncertainty is 
about 3 msec per day. 

Coordinated Universal Time (UTC) is used in satellite positioning, and is main¬ 
tained with atomic clocks. By convention, it is always within a margin of 0.9 sec 
of UTl, and fwice aimually it may be given a shift to stay within that margin. 

This occasional shift of a leap second is applied af the end of June 30 or preferably 

at the end of December 31. The last minute of such a day is then either 59 or UTC 

61 seconds long. So far, adjustments have always been to add a second. UTC 

time can only be determined to the highest precision after the fact, as atomic 

time is determined by the reconciliation of the observed differences between a 

number of atomic clocks maintained by different national time bureaus. 

In recent years we have learned to measure distance, therefore also position, 
with clocks using satellite signals. The conversion factor is the speed of light, 
approximately 3 ■ 10® m/s in vacuum. No longer can multiple seconds of clock 

bias be allowed, and this is where atomic clocks come in. They are very accurate Atomic clocks 

time keepers, based on the exactly known frequency with which specific afoms 
(Cesium, Rubidium and Hydrogen) make discrete energy state jumps. Posi¬ 
tioning satellites usually have multiple clocks onboard; ground control stations 
have even better quality atomic clocks. 

Atomic clocks, however, are not flawless: their timing tends to drift away from 
true time somewhat, and they too need to be corrected. The drift, and the change 
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in drift over time, are monitored, and are part ot the satellite's navigation mes¬ 
sage, so that they can be corrected for. 
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4.2.2 Errors in absolute positioning 

Before we continue discussing other modes of satellite-based positioning, let us 
take a close look at the potential for error in absolute positioning. Receiver users 
are required to be sufficiently familiar with the technology in order to avoid 
true operating blunders such as bad receiver placement or incorrect receiver 
software settings, which can render the results virtually useless. We will skip 
over many of the physical and mathematical details underlying these errors, but 
they are mentioned here to raise awareness and understanding with users of 
this technology. For background information on the calculation of positional er¬ 
ror ( specifically, the calculation of RMSE or root mean square error), readers are 
referred to Section 5.2.2. 
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Errors related to the space segment 

As a first source of error, the operators of the control segment may intentionally 
deteriorate radio signals of the satellites to the general public, to avoid optimal 
use of the system by the enemy, for instance in times of global political tension 
and war. This selective availability —meaning that the military forces allied with 
the control segment will still have access to undisturbed signals—may cause er¬ 
ror that is an order of magnitude larger than all other error sources combined.^ 

Secondly, the satellite message may contain incorrect information. Assuming 
that it will always know its own identifier, the satellite may make two kinds of 
error: 

1. Incorrect clock reading: Even atomic clocks can be off by a small margin, 
and since Einstein, we know that travelling clocks are slower than resident 
clocks, due to a so-called relativistic effect. If one understands that a clock 
that is off by 0.000001 sec causes an computation error in the satellite's 
pseudorange of approximately 300 m, it is clear that these satellite clocks 
require very strict monitoring. 

2. Incorrect orbit position: The orbit of a satellite around our planet is easy to 
describe mathematically if both bodies are considered point masses, but 
in real life they are not. For the same reasons that the Geoid is not a sim¬ 
ply shaped surface, the Earth's gravitation field that a satellite experiences 

^Selective availability was stopped at the beginning of May 2000, and in late 2007 the White 
House decided to remove selective availability capabilities completely. However, the US gov¬ 
ernment still has a range of capabilities and fechnology fo implemenf regional denial of service 
of civilian GPS signals when needed in an area of conflicf, effecfively producing fhe same resulf. 
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in orbit is not simple either. Moreover, it is disturbed by solar and lunar 
gravitation, making its flight path slightly erratic and difficult to forecast 
exactly. 

Both types of error are strictly monitored by the ground control segment, which 
is responsible for correcting any errors of this nature, but it does so by apply¬ 
ing an agreed upon tolerance. A control station can obviously compare results 
of positioning computations like discussed above with its accurately known po¬ 
sition, flagging any unacceptable errors, and potentially labelling a satellite as 
temporarily 'unhealthy' until errors have been corrected, and brought to within 
the tolerance. This may be done by uploading a correction on the clock or orbit 
settings to the satellite. 
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Errors related to the medium 

Thirdly, the medium between sender and receiver may be of influence to the radio 
signals. The middle atmospheric layers of strato- and mesosphere are relatively 
harmless and of little hindrance to radio waves, but this is not true of the lower 
and upper layer. They are, respectively: 

• The troposhere: the approximate 14 km high airspace just above the Earth's 
surface, which holds much of the atmosphere's oxygen and which en¬ 
velopes all phenomena that we call the weather. It is an obstacle that delays 
radio waves in a rather variable way. 

• The ionosphere: the most outward part of the atmosphere that starts at an 
altitude of 90 km, holding many electrically charged atoms, thereby form¬ 
ing a protection against various forms of radiation from space, including to 
some extent radio waves. The degree of ionization shows a distinct night 
and day rhythm, and also depends on solar activity. 

The latter is a more severe source of delay to satellite signals, which obviously 
means that pseudoranges are estimated larger than they actually are. When 
satellites emit radio signals at two or more frequencies, an estimate can be com¬ 
puted from differences in delay incurred for signals of different frequency, and 
this will allow for the correction of atmospheric delay, leading to a 10-50% im¬ 
provement of accuracy. If this is not the case, or if the receiver is capable of 
receiving just a single frequency, a model should be applied to forecast the (es¬ 
pecially ionospheric) delay, typically taking into account the time of day and 
current latitude of the receiver. 
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Errors related to the receiver's environment 


Fourth in this list is the error occurring when a radio signal is received via two or 
more paths between sender and receiver, some of which typically via a bounce 

off of some nearby surface, like a building or rock face. The term applied to this Multi-path error 

phenomenon is multi-path; when it occurs the multiple receptions of the same 
signal may interfere with each other (see Figure 4.26). Multi-path is a difficult to 
avoid error source. 



Figure 4.26; At any point 
in time, a number of satel¬ 
lites will be above the re¬ 
ceiver’s horizon. But not 
all of them will be ‘in view’ 
(like the left and right satel¬ 
lites), and for others multi- 
path signal reception may 
occur. 


All of the above error sources have an influence on the computation of a satel¬ 
lite's pseudorange. In accumulation, they are called the user equivalent range error 

(UERE). Some error sources may be at work tor all satellites being used by the re- Range error 

ceiver, for instance, selective availability and the atmospheric delay, while others 
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may be specific to one satellite, for instance, incorrect satellite information and 
multi-path. 
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Errors related to the relative geometry of satellites and receiver 


There is one more source of error that is unrelated to individual radio signal 
characteristics, but that rather depends on the combination of the satellite sig¬ 
nals used for positioning. Of importance is their constellation in the sky from 
the receiver perspective. Referring to Figure 4.27, one will understand that the 
sphere intersection technique of positioning will provide more precise results 
when the four satellites are nicely spread over the sky, and thus that the satel¬ 
lite constellation of Figure 4.27(b) is preferred over the one of 4.27(a). This error 

source is know as geometric dilution of precision (GDOP). GDOP is lower when Geometric dilution of 

satellites are just above the horizon in mutually opposed compass directions. precision 

However, such satellite positions have bad atmospheric delay characteristics, so 
in practice it is better if they are at least 15° above the horizon. When more than 
four satellites are in view, modern receivers use Teast-squares' adjustment to cal¬ 
culate the best positional fix possible from all of the signals. This gives a better 
solution that just using the "best four", as was done previously. 


satellite clock 

2 m 

satellite position 

2.5 m 

ionospheric delay 

5 m 

tropospheric delay 

0.5 m 

receiver noise 

0.3 m 

multi-path 

0.5 m 

Total RMSE Range error: 


^J2^ + 2.5' + 5' + 0.5' + 0.3' + 0.5^ = 

5.97 m 


Table 4.4: Indication of 
typical magnitude of errors 
in absolute satellite-based 
positioning 


These errors are not all of similar magnitude. An overview of some typical val- 
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(a) 




(b) 


Figure 4.27: Geometric 
dilution of precision. The 
four satellites used for po¬ 
sitioning can be in a bad 
constellation (a) or in a 
better constellation (b). 


ues (without selective availability) is provided in Table 4.4. GDOP functions 
not so much as an independent error source but rather as a multiplying factor, 
decreasing the precision of position and time values obtained. 

The procedure that we discussed above is known as absolute, single-point posi¬ 
tioning based on code measurement. It is the fastest and simplest, yet least accurate 
way of determining a position using satellites. It suffices for recreational pur¬ 
poses and other applications that require horizontal accuracy not under 5-10 m. 
Typically, when encrypted military signals can also be used, on a dual-frequency 
receiver the achievable horizontal accuracy is 2-5 m. Below, we discuss other 
satellite-based positioning techniques with better accuracies. 
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4.2.3 Relative positioning 

One technique of trying to remove errors from positioning computations is to 
perform many position computations, and to determine the average over the 
solutions. Many receivers allow the user to do so. It should however be clear 
from the above that averaging may address random errors like signal noise, selec¬ 
tive availability (SA) and multi-path to some extent, but not systematic sources 

of error, like incorrect satellite data, atmospheric delays, and GDOP effects. Random and systematic 

These sources should be removed before averaging is applied. If has been shown si'i'oi' 

that averaging over 60 minutes in absolute, single-point positioning based on 

code measurements, before sysfemafic error removal, leads only fo a 10-20% 

improvement of accuracy. In such cases, receiver averaging is therefore of lim- 

ifed value, and requires long periods under near-optimal conditions. Averaging 

is a good technique if systemafic errors have been accounfed for. 

In relative positioning, also known as differential positioning, one tries to remove 
some of the systematic error sources by taking into account measurements of 
these errors in a nearby stationary reference receiver with an accurately known 
position. By using these systematic error findings af the reference, the position 
of the target receiver of inferesf will become known much more precisely. 

In an optimal setting, reference and fargef receiver experience idenfical condi¬ 
tions and are cormecfed by a direct data link, allowing the target to receive cor¬ 
rectional data from the reference. In practice, relative positioning allows refer¬ 
ence and farget receiver to 70-200 km apart, and they will essentially experience 
similar atmospheric signal error. For each satellite in view, the reference receiver 

^'^Please refer to section 5.2.2 for more detail on measurement error. 
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will determine its pseudorange error. After all, its position is known with high 
accuracy, so it can solve any pseudorange equations to determine the error. Sub¬ 
sequently, the target receiver, having received the error characteristics will apply 
the correction for each of the four satellite signals that it uses for positioning. In 
so doing, it can narrow down its accuracy to the 0.5-5 m range. 

The above sketch assumes we needed positioning information in real time, which 
called for the data link between reference and target receiver. But various uses 
of satellite-based positioning do not need the real time data, and allow post¬ 
processing of the recorded positioning data. If the target receiver records time 
and position accurately, correctional data can later be used to improve the origi¬ 
nally recorded data. 

Finally, there is also a notion of inverted relative positioning. The principles are still 
as above, but in this technique the target receiver does not correct for satellite 
pseudorange error either, but uses a data link to upload its positioning/timing 
information to a central repository, where the corrections are applied. This can 
be useful in cases where many target receivers are needed and budget does not 
allow them to be expensive. 
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4.2.4 Network positioning 

After discussing the advantages of relative positioning, we can move on to the 
notion of network positioning: an integrated, systematic network of reference re¬ 
ceivers covering a large area like a continent or even the whole globe. 

The organization of such a network can take different shapes, augmenting an 
already existing satellite-based system. Here we discuss a general architecture, 
consisting of a network of reference stations, strategically positioned in the area to 
be covered, each of which is constantly monitoring signals and their errors for 
all positioning satellites in view. One or more control centres receive the reference 
station data, verify this for correctness, and relay (uplink) this information to 
a geostationary satellite. The satellite will retransmit the correctional data to the 
area that it covers, so that target receivers, using their own approximate position, 
can determine how to correct for satellite signal error, and consequently obtain 
much more accurate position fixes. 

With network positioning, accuracy in the submetre range can be obtained. Typ¬ 
ically, advanced receivers are required, but the technology lends itself also for 
solutions with a single advanced receiver that functions in the direct neighbour¬ 
hood as a reference receiver to simple ones. 
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4.2.5 Code versus phase measurements 

Up until this point, we have assumed that the receiver determines the range of 
a satellite by measuring time delay on the received ranging code. There exists a 
more advanced range determination technique known as carrier phase measure¬ 
ment. This typically requires more advanced receiver technology, and longer 
observation sessions. Carrier phase measurement can currently only be used 
with relative positioning, as absolute positioning using this method is not yet 
well developed. 

The technique aims to determine the number of cycles of the (sine-shaped) radio 
signal between sender and receiver. Each cycle corresponds to one wavelength 
of the signal, which in the applied L-band frequencies is 19-24 cm. Since this 
number of cycles caimot be directly measured, it is determined, in a long obser¬ 
vation session, from the change in carrier phase with time. This happens because 
the satellite is orbiting itself. From its orbit parameters and the change in phase 
over time, the number of cycles can be derived. 

With relative positioning techniques, a horizontal accuracy of 2 mm-2 cm can 
be achieved. This degree of accuracy makes it possible to measure tectonic plate 
movements, which can be as big as 10 cm per year in some locations on the 
planet. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















4 . 2 . Satellite-based positioning 


258 


4.2.6 Positioning technology 

We include this section to provide the reader with a little information on cur¬ 
rently available satellite-based positioning technology. It should be noted that 
this textbook will easily outlive the currency of the information contained within 
it, as our technology is constantly evolving. 

At present, two satellite-based positioning systems are operational (GPS and 
GLONASS), and a third is in the implementation phase (Galileo). Respectively, 
these are American, Russian and European systems. Any of these, but especially 
GPS and Galileo, will be improved over time, and will be augmented with new 
techniques. 
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GPS 

The NAVSTAR Global Positioning System (GPS) was declared operational in 
1994, providing Precise Positioning Services (PPS) to US and allied military forces 
as well as US government agencies, and Standard Positioning Services (SPS) to 
civilians throughout the world. Its space segment nominally consists of 24 satel¬ 
lites, each of which orbit our planet in llhSSm at an altitude of 20,200 km. There 

can be any number of satellites active, typically between 21 and 27. The satel- Orbital planes 

lites are organized in six orbital planes, somewhat irregularly spaced, with an 

angle of inclination of 55-63° with the equatorial plane, nominally having four 

satellites each (see Figure 4.28). This means that a receiver on Earth will have 

between five and eight (sometimes up to twelve) satellites in view at any point 

in time. Software packages exist to help in plaiming GPS surveys, identifying 

expected satellite set-up for any location and time. 

GPS's control segment has its master control in Colorado, US, and monitor sta¬ 
tions in a belt around the equator, namely in Hawaii, Kwajalein Atoll in the 
Marshall Islands, Diego Garcia (British Indian Ocean Territory) and Ascension 
Island (UK, southern Atlantic Ocean). 

The NAVSTAR satellites transmit two radio signals, namely the LI frequency at 
1575.42 MHz and the L2 frequency at 1227.60 MHz. There are also a third and 
fourth signal, but they are not important for our discussion here. The first two 
signals consist of: 

• The carrier waves at the given frequencies, 

• A coarse ranging code, known as C/A, modulated on LI, 
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Figure 4.28: Constellation 
of satellites, four shown in 
only one orbit plane, in the 
GPS system. 


• An encrypted precision ranging code, known as P(Y), modulated on LI 
and L2, and 

• A navigation message modulated on both LI and L2. 


The role of L2 is to provide a second radio signal, thereby allowing (the more 
expensive) dual-frequency receivers a way of determining fairly precisely the 
actual ionospheric delay on satellite signals received. The role of the ranging 
codes is two-fold: 
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1. To identify the satellite that sent the signal, as each satellite sends unique 
codes, and the receiver has a look-up table for these codes, and 

2. To determine the signal transit time, and thus the satellite's pseudorange. 

The navigation message contains the satellite orbit and satellite clock error in¬ 
formation, as well as some general system information. GPS also carries a fifth, 
encrypted military signal carrying the M-code. GPS uses WGS84 as its reference 

system. It has been refined on several occasions and is now aligned with the WGS84 and ITRF 

ITRF at the level of a few centimetres worldwide. (See also Section 4.1.1.) GPS 
has adopted UTG as its time system. 

In the civil market, GPS receivers of varying quality are available, their quality 
depending on the embedded positioning features: supporting single- or dual¬ 
frequency, supporting only absolute or also relative positioning, performing 
code measurements or also carrier phase measurements. Leica and Trimble are 

two of the well-known brands in the high-precision, professional surveying do- GPS manufacturers 

main; Magellan and Garmin, for instance, operate in the lower price, higher 
volume consumer market range, amongst others for recreational use in outdoor 
activities. Many of these are single frequency receivers, doing only code mea¬ 
surements, though some are capable of relative positioning. This includes the 
new generation of GPS-enabled mobile phones. 
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GLONASS 

What GPS is to the US military, is GLONASS to the Russian military, specifically 
the Russian Space Forces. Both systems were primarily designed on the basis of 
military requirements. The big difference between the two is that GPS generated 
a major interest in civil applications, thus having an important economic impact. 
This carmot be said of GLONASS. 

The GLONASS space segment consists of nominally 24 satellites, organized in 
three orbital planes, with an inclination of 64.8° with the equator. Orbiting al¬ 
titude is 19,130 km, with a period of revolution of 11 hours 16 min. GLONASS 
uses the PZ-90 as its reference system, and like GPS uses UTC as time reference, 
though with an offset for Russian daylight. 

GLONASS radio signals are somewhat similar to that of GPS, but differ in the 
details. Satellites use different identifier schemes, and their navigation message 
use other parameters. They also use different frequencies: GLONASS LI is at 
approximately 1605 MHz (changes are underway), and L2 is at approximately 
1248 MHz. Otherwise, the GLONASS system performance is rather comparable 
to that of GPS. 
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Galileo 

In the 1990's, the European Union (EU) judged that it needed to have its own 
satellite-based positioning system, to become independent of the GPS monopoly 
and to support its own economic growth by providing services of high reliabilify 
under civilian control. 

Galileo is the name of this EU system. The vision is that satellite-based position¬ 
ing will become even bigger due to the emergence of mobile phones equipped 
with receivers, perhaps with some 400 million users by the year 2015. Develop¬ 
ment of the system has experienced substantial delays, and at the time of writing 
European ministers insist that Galileo should be up and rurming by the end of 
2013. The completed system will have 27 satellites, with three in reserve, orbit¬ 
ing in one of fhree, equally spaced, circular orbits at an elevation of 23,222 km, 
inclined 56° with the equator. This higher inclination, when compared to that of 
GPS, has been chosen to provide better positioning coverage at high latitudes, 
such as northern Scandinavia where GPS performs rather poorly. 

In June 2004, the EU and the US agreed to make Galileo and GPS compatible by 
adoption of inferchangeable satellife signal sef-ups. The effecf of fhis agreement 
is that the Galileo/GPS tandem satellite system will have so many satellites in 
the sky (close to 60) that a receiver can almost always find an optimal constel¬ 
lation in view. This will be especially useful in sifuations where in the past bad 
signal reception happened: in built-up areas and forests, for insfance. It will 
also bring the implementation of a Global Navigation Safellife System (GNSS) 
closer as positional accuracy and reliability will improve. With such a system, 
eventually one expects to implement fully automated air and road traffic. Aufo- 
mafic aircraff landing, for insfance, requires horizontal accuracy in the order of 
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4 m, and vertical accuracy below 1 m: these requirements can currently not be 
achieved reliably. 

The Galileo Terrestrial Reference Frame (GTRF) will be a realization of the ITRS 
independently set up from that of GPS, so that one system can back-up for the 
other. Positional differences between the WGS84 and the GTRF will be at worst a 
few centimetres. The Galileo System Time (GST) will closely follow Infernafional 
Afomic Time (TAI) with a time offset of less than 50 nsec for 95 % of the time over 
any period of a year. Information on the actual offset between GST and TAI, 
and between GST and UTG (as used in GPS) will be broadcasted in the Galileo 
satellite signal. 
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Satellite-based augmentation systems 

Satellite-based augmentation systems (SBAS) aim to improve accuracy and re¬ 
liability of satellite-based positioning (see the section on network positioning, 
page 256) in support of safety-critical navigation applications such as aircraft 
operations near airfields. The typical technique is to provide an extra, now geo¬ 
stationary, satellite that has a large service area like a continent, and which sends 
differential data about standard positioning satellites that are currently in view 
in its service area. If multiple ground reference stations are used, the quality of 
the differential data can be quite good and reliable. Signals typically use the fre¬ 
quency already in use by the positioning satellites, so that receivers can receive 
the differential code without problem. 

Not all advantages of satellite augmentation will be useful for all receivers. For 
consumer market receivers, the biggest advantage, as compared to standard rel¬ 
ative positioning, is that SBAS provides an ionospheric correction grid for its 
service area, from which a correction specific for the location of the receiver can 
be retrieved. This is not true in relative positioning, where the reference station 
determines the error it experiences, and simply broadcasts this information for 
nearby target receivers to use. With SBAS, the receiver obtains information that 
is best viewed as a geostatistical interpolation of errors from multiple reference 
stations. More advanced receivers will be able to deploy also other differential 
data such as corrections on satellite position and satellite clock drift. 

Currently, three systems are operational: for North America WAAS (Wide-Area 
Augmentation System) is in place, EGNOS (European Geostationary Navigation 
Overlay Service) for Europe, and MSAS (Multi-functional Satellite Augmenta¬ 
tion System) for eastern Asia. The ground segment of WAAS consists of 24 con- 
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trol stations, spread over North America; that of EGNOS has 34 stations. These 
three systems are compatible, guaranteeing international coverage. 

Signals of the respective satellites (under various names like AOR, Artemis, lOR, 
Inmarsat, MTSAT) can usually be received outside their respective service areas, 
but the use of these signals is be discouraged, as they will not help improve 
positional accuracy. Satellite identifiers, as shown in the receiver, have numbers 
above 30, setting them apart from standard positioning satellites. 
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Summary 


This chapter focuses upon locating objects and events on the Earth's surface. In 
this context, a number of principles related to spatial reference systems, includ¬ 
ing vertical and horizontal datums were discussed in Section 4.1.^^ 

To summarise, each projection and datum has particular characteristics that make 
it useful for specific mapping purposes. A projection is chosen to minimize the 
errors for the area and relevant to the scale of the mapping project being un¬ 
dertaken, and the required distortion property, which in turn depends on the 
purpose for which the map will be used. We need to be aware of issues brought 
about by the combination of spatial data from different sources that use different 
reference systems. This issue is becoming increasingly important, as more and 
more data is being shared. Often, transformations are necessary to enable the 
combination of disparate data layers. 

Section 4.2 discussed the various methods of satellite-based positioning, from 
basic principles to characteristics of currenf implementations, and the different 
levels of accuracy associated with each of these methods. This included a discus¬ 
sion of sources of error in the context of both absolute and relative positioning. 
Key aspects of posifional accuracy are dealt with in more detail in the following 
chapter, in the context of data quality. 


^^This section is accompanied by a website at http://kartoweb.itc.nl/geometrics. Here, inter¬ 
ested readers can find more background information and a list of frequenfly asked quesfions. 
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Questions 

1. You wish to reconcile spatial data from two neighbouring countries to re¬ 
solve a border dispute. Published maps in the two countries are based on 
different local horizontal datums and map projections. Which steps should 
you take to render the data sets spatially compatible? 

2. On page page 196 we mentioned that in geodetic practice the definition of 
an ellipsoid is usually by ifs semi-major axis a and flattening /. Flatten¬ 
ing is dependent on both the semi-major axis a and the semi-minor axis 
b. Assume that the semi-major axis a of an ellipsoid is 6378137 m and the 
flattening / is 1:298.257. Using these facts determine the semi-minor axis b 
(make use of the given equations). 

3. You are required to match GPS data with some map data. The GPS data 
and the map layer are based on different horizontal datums. Which steps 
should you take to make the GPS data spatially compatible with the map 
data? 

4. Suppose you wish to produce a small scale thematic map of your counfry. 
The map should show the population densities for the different regions (or 
provinces). What would be a good map projection for the representation 
of the population densities of your country? Gonsider the class of the pro¬ 
jection, the projection property and the Itne(s) of intersection or the point 
or line of tangency. 




<^> 
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5. In section 4.2.1, we discussed the principles of absolute positioning. To a 
large extent, there is an analogy with how human beings assess the risks 
of a thunder storm with lightning. Explain that analogy, and discuss the 
'measuring errors'. 

6. Estimate a realistic distance between a GPS satellite that is in view and 
the receiver that you are holding, clarifying your assumptions. Indicafe 
minimum and maximum values. Einally, compufe the time delay a satel¬ 
lite message incurs before being received, again clarifying the assumptions 
made. 

7. On page 247 we mentioned size of the pseudorange error with respect to 
satellite clock error. Think up why the estimates were as given. Also anal¬ 
yse, using geometric arguments, what positioning error might result from 
a single satellite clock error of 0.000001 sec. 

8. How could one force a GPS receiver fo operate in 2D positioning mode? 
How would one set up an experiment to determine positioning accuracy 
in this mode and the relation to actual height of the receiver? 
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Chapter 5 

Data entry and preparation 


Spatial data can be obtained from various sources. It can be collected from 
scratch, using direct spatial data acquisition techniques, or indirectly, by mak¬ 
ing use of existing spatial data collected by others. Under the first heading we 
could include field survey data and remotely sensed images. Under the second 
fall paper maps and existing digital data sets. 

This chapter discusses the collection and use of data under both of these head¬ 
ings. It seeks to prepare users of spatial data by drawing attention to issues 
concerning data accuracy and quality. A range of procedures for data check¬ 
ing and clean-up are discussed to prepare data for analysis, including several 
methods for interpolating point data. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 


270 




















5.1. Spatial data input 


271 


5.1 Spatial data input 
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5.1.1 Direct spatial data capture 

One way to obtain spatial data is by direct observation of the relevant geographic 
phenomena. This can be done through ground-based field surveys, or by using 

remote sensors in satellites or airplanes (see Chapter 4). Many Earth sciences Primary data 

have developed their own survey techniques, as ground-based techniques re¬ 
main the most important source for reliable data in many cases. 


Data which is captured directly from the environment is known as primary 
data. 


With primary data the core concern in knowing its properties is to know the 
process by which it was captured, the parameters of any instruments used and 
the rigour with which quality requirements were observed. 

Remotely sensed imagery is usually not fit for immediate use, as various sources 
of error and distortion may have been present, and the imagery should first be 
freed from these. This is the domain of remote sensing, and these issues are 
discussed further in Principles of Remote Sensing [53]. In the context of this book. 


An image refers to raw data produced by an electronic sensor, which are not 
pictorial, but arrays of digital numbers related to some property of an object 
or scene, such as the amount of reflected light. 


For an image, no interpretation of reflectance values as thematic or geographic 
characteristics has taken place. When the reflectance values have been translated 

into some 'thematic' variable, we refer to it as a raster. Section 2.3.1 provides Images and rasters 
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more detail on rasters. It is interesting to note that we refer to image pixels but to 
raster cells, although both are stored in a GIS in the same way. 

In practice, it is not always feasible to obtain spatial data by direct spatial data 
capture. Factors of cost and available time may be a hindrance, or previous 
projects sometimes have acquired data that may fit the current project's purpose. 
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5.1.2 Indirect spatial data capture 

In contrast to direct methods of data capture described above, spatial data can 

also be sourced indirectly. This includes data derived from existing paper maps Secondary data 

through scanning, data digitized from a safellife image, processed data pur¬ 
chased from data capture firms or infernafional agencies, and so on. This fype 
of dafa is known as secondary data: 


Any data which is not captured directly from the environment is known as 
secondary data. 


Below we discuss key sources of secondary dafa and issues relafed fo fheir use 
in analysis of which fhe user should be aware. 
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Digitizing 

A traditional method of obtaining spatial data is through digitizing existing pa¬ 
per maps. This can be done using various techniques. Before adopting this 
approach, one must be aware that positional errors already in the paper map 
will further accumulate, and one must be willing to accept these errors. 

There are two forms of digitizing: on-tablet and on-screen manual digitizing. In 
on-tablet digitizing, the original map is fitted on a special surface (the tablet), 
while in on-screen digitizing, a scaimed image of the map (or some other image) 
is shown on the computer screen. In both of these forms, an operator follows the 
map's features (mostly lines) with a mouse device, thereby tracing the lines, and 
storing location coordinates relative to a number of previously defined control 

points. The function of these points is to 'lock' a coordinate system onto the Control points 

digitized data: the control points on the map have known coordinates, and by 

digitizing them we tell the system implicitly where all other digitized locations 

are. At least three control points are needed, but preferably more should be 

digitized to allow a check on the positional errors made. 

Another set of techniques also works from a scaimed image of the original map, 
but uses the GIS to find features in the image. These techniques are known as 
semi-automatic or automatic digitizing, depending on how much operator inter¬ 
action is required. If vector data is to be distilled from this procedure, a pro¬ 
cess known as vectorization follows the scanning process. This procedure is less 
labour-intensive, but can only be applied on relatively simple sources. 
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Scanning 

An 'office' scanner illuminafes a documenf and measures the intensity of the 
reflected light with a CCD array. The result is an image as a matrix of pixels, 
each of which holds an intensity value. Office scarmers have a fixed maximum 
resolution, expressed as the highest number of pixels they can identify per inch; 
the unit is dots-per-inch (dpi). For manual on-screen digitizing of a paper map, 
a resolution of 200-300 dpi is usually sufficient, depending on the thickness of 
the thiimest lines. For manual on-screen digitizing of aerial photographs, higher 
resolutions are recommended—typically, at least 800 dpi. 

(Semi-)automatic digitizing requires a resolution that results in scaimed lines of 
at least three pixels wide to enable the computer to trace the centre of the lines 
and thus avoid displacements. For paper maps, a resolution of 300-600 dpi is 
usually sufficient. Automatic or semi-automatic tracing from aerial photographs 
can only be done in a limited number of cases. Usually, the information from 
aerial photos is obtained through visual interpretation. 

After scaiming, the resulting image can be improved with various image pro¬ 
cessing techniques. It is important to understand that scaiming does not result 
in a structured data set of classified and coded objects. Additional work is re¬ 
quired to recognize features and to associate categories and other thematic at¬ 
tributes with them. 
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Vectorization 

The process of distilling points, lines and polygons from a scanned image is 
called vectorization. As scarmed lines may be several pixels wide, they are often 
first thirmed to retain only the centreline. The remaining centreline pixels are 
converted to series of (x, y) coordinate pairs, defining a polyline. Subsequently, 
features are formed and attributes are attached to them. This process may be en¬ 
tirely automated or performed semi-automatically, with the assistance of an op¬ 
erator. Pattern recognition methods—like Optical Character Recognition (OCR) 
for text—can be used for the automatic detection of graphic symbols and text. 

Vectorization causes errors such as small spikes along lines, rounded corners, 
errors in T- and X-junctions, displaced lines or jagged curves. These errors are 
corrected in an automatic or interactive post-processing phase. The phases of 
the vectorization process are illustrated in Figure 5.1. 
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Figure 5.1: The phases 
of the vectorization pro¬ 
cess and the various sorts 
of small error caused by 
it. The post-processing 
phase makes the final re¬ 
pairs. 
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Selecting a digitizing technique 

The choice of digitizing technique depends on the quality, complexity and con¬ 
tents of the input document. Complex images are better manually digitized; 
simple images are better automatically digitized. Images that are full of detail 
and symbols—like topographic maps and aerial photographs—are therefore bet¬ 
ter manually digitized. 

In practice, the optimal choice may be a combination of methods. For example, 
contour line film separations can be automatically digitized and used to produce 
a DEM. Existing topographic maps must be digitized manually, but new, geo¬ 
metrically corrected aerial photographs, with vector data from the topographic 
maps displayed directly over it, can be used for updating existing data files by 
means of manual on-screen digitizing. 
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5.1.3 Obtaining spatial data elsewhere 

Over the past two decades, spatial data has been collected in digital form at in¬ 
creasing rate, stored in various databases by the individual producers for their 
own use and for commercial purposes. More and more of this data is being 
shared among GIS users. This is for several reasons. Some of this data is freely 
available, although other data is only available commercially, as is the case for 
most satellite imagery. High quality data remain both costly and time-consuming 
to collect and verify, as well as the fact that more and more GIS applications are 
looking at not just local, but national or even global processes. As we will see 
below, new technologies have played a key role in the increasing availability of 
geospatial data. As a result of this increasing availability, we have to be more 
careful that the data we have acquired is of sufficient quality to be used in analy¬ 
sis and decision making. For this reason, we discuss key data quality parameters 
in Section 5.2. 
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Clearinghouses and web portals 

Spatial data can also be acquired from centralized repositories. More often those 
repositories are embedded in Spatial Data Infrastructures (see Section 3.2.3), 
which make the data available through what is sometimes called a spatial data 

clearinghouse. This is essentially a marketplace where data users can 'shop'. It Spatial Data Infrastructures 
will be no surprise that such markets for digital data have an entrance through 
the world wide web. The first entrance is typically formed by a web portal which 
categorizes all available data and provides a local search engine and links to data 
documentation (also called metadata). It often also points to data viewing and 
processing services. Standards-based geo-webservices have become the com¬ 
mon technology behind such portal services (see below for further detail). 
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Metadata 

Metadata is defined as background information that describes all necessary in¬ 
formation about the data itself. More generally, it is known as 'data about data'. 
This includes: 


• Identification information: Data source(s), time of acquisition, etc. 

• Data quality information: Positional, attribute and temporal accuracy, lin¬ 
eage, etc. 

• Entity and attribute information: Related attributes, units of measure, etc. 


In essence, metadata answer who, what, when, where, why, and how questions 
about all facets of the data made available. Maintaining metadata is an key part 
in maintaining data and information quality in GIS. This is because it can serve 
different purposes, from description of the data itself through to providing in¬ 
structions for data handling. Depending on the type and amount of metadata 
provided, it could be used to determine the data sets that exist for a geographic 
location, evaluate whether a given data set meets a specified need, or to process 
and use a data set. 
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Data formats and standards 

An important problem in any environment involved in digital data exchange 
is that of data formats and data standards. Different formats were implemented 
by different GIS vendors; different standards came about with different stan¬ 
dardization committees. The phrase 'data standard' refers to an agreed upon ISO and OGC standards 
way of representing data in a system in terms of content, type and format. The 
good news about both formats and standards is that there are many to choose 
from; the bad news is that this can lead to a range of conversion problems. Sev¬ 
eral metadata standards standards for digital spatial data exist, including the 
International Organization for Standardization (ISO) and the Open Geospatial 
Gonsortium (OGG) standards. 
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5.2 Data quality 

With the advent of satellite remote sensing, GPS and GIS technology, and the 
increasing availability of digital spatial data, resource managers and others who 
formerly relied on the surveying and mapping profession to supply high quality 
map products are now in a position to produce maps themselves. At the same 

time, GISs are being increasingly used for decision support applications, with in- Application requirements 
creasing reliance on secondary data sourced through data providers or via the 
internet, through geo-webservices. The implications of using low-quality data 
in important decisions are potentially severe. There is also a danger that unin¬ 
formed GIS users introduce errors by incorrectly applying geometric and other 
transformations to the spatial data held in their database. 

Below we look at the main issues related to data quality in spatial data. As 
outlined in Section 1.1.4, we will discuss positional, temporal and attribute ac¬ 
curacy, lineage, completeness, and logical consistency. We will begin with a brief 
discussion of the terms accuracy and precision, as these are often taken to mean 
the same thing. For a more detailed discussion and advanced topics relating to 
data quality, the reader is referred to [17]. 
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5.2.1 Accuracy and precision 

So far we have used the terms error, accuracy and precision without appropri¬ 
ately defining them. Accuracy should not be confused with precision, which is 
a statement of the smallest unit of measurement to which data can be recorded. 

In conventional surveying and mapping practice, accuracy and precision are 
closely related. Instruments with an appropriate precision are employed, and 

surveying methods chosen, to meet specified accuracy tolerances. In GIS, how- Accuracy tolerances 

ever, the numerical precision of computer processing and storage usually ex¬ 
ceeds the accuracy of the data. This can give rise to so-called spurious accuracy, 
for example calculating area sizes to the nearest m^ from coordinates obtained 
by digitizing a 1 : 50,000 map. 

Using graphs that display the probability distribution (for which see below) of 
a measurement against the true value T, the relationship between accuracy and 
precision can be clarified. In Figure 5.2, we depict the cases of good/bad accu¬ 
racy against good/bad precision.^ An accurate measurement has a mean close to 
the true value; a precise measurement has a sufficiently small variance. 


^Here we use the terms 'good' and 'bad' to illustrate the extremes of both accuracy and pre¬ 
cision. In real world terms we refer fo whefher dafa is 'fif for use' for a given applicafion. 
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Figure 5.2: A measure¬ 
ment probability function 
and the underlying true 
value T: (a) bad accuracy 
and precision, (b) bad 
accuracy/good precision, 
(c) good accuracy/bad 
precision, and (d) good 
accuracy and precision. 
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5.2.2 Positional accuracy 

The surveying and mapping profession has a long tradition of determining and 
minimizing errors. This applies particularly to land surveying and photogram- 
metry, both of which tend to regard positional and height errors as undesirable. 

Cartographers also strive to reduce geometric and attribute errors in their prod¬ 
ucts, and, in addition, define quality in specifically cartographic terms, for ex¬ 
ample quality of linework, layout, and clarity of text. 

It must be stressed that all measurements made with surveying and photogram- 
metric instruments are subject to error. These include: 

1. Human errors in measurement (e.g. reading errors) generally referred to as 
gross errors or blunders. These are usually large errors resulting from care¬ 
lessness which could be avoided through careful observation, although it 
is never absolutely certain that all blunders have been avoided or elimi¬ 
nated. 

2. Instrumental or systematic errors (e.g. due to misadjustment of instruments). 

This leads to errors that vary systematically in sign and/or magnitude, but 

can go undetected by repeating the measurement with the same instru- Error sources 

ment. Systematic errors are paticularly dangerous because they tend to 

accumulate. 

3. So-called random errors caused by natural variations in the quantity being 
measured. These are effectively the errors that remain after blunders and 
systematic errors have been removed. They are usually small, and dealt 
with in least-squares adjustment. 
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Section 4.2 discussed the errors inherent in various methods of spatial position¬ 
ing. Below we will look at more general ways of quantifying positional accuracy 
using root mean square error (RMSE). 

Measurement errors are generally described in terms of accuracy. In the case 
of spatial data, accuracy may relate not only to the determination of coordinates 
(positional error) but also to the measurement of quantifative affribute dafa. The 
accuracy of a single measuremenf can be defined as: 


"the closeness of observations, computations or estimates to the true values 
or the values perceived to be true" [41]. 


In the case of surveying and mapping, the 'truth' is usually taken to be a value 
obtained from a survey of higher accuracy, for example by comparing pho- 
togrammetric measurements with the coordinates and heights of a number of 
independent check points determined by field survey. Although it is useful for 
assessing the quality of definite objects, such as cadastral boundaries, this def¬ 
inition clearly has practical difficulties in the case of natural resource mapping 
where the 'truth' itself is uncerfain, or boundaries of phenomena become fuzzy. 

This fype of uncertainfy in natural resource data is elaborated upon on page 295. 

Prior to the availability of GPS, resource surveyors working in remofe areas 
sometimes had fo be contenf with ensuring an acceptable degree of relative ac¬ 
curacy among the measured positions of points within the surveyed area. If lo- Relative and absolute 

cation and elevation are fixed with reference to a network of control points that accuracy 

are assumed to be free of error, then the absolute accuracy of the survey can be 
determined. 
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Root mean square error 

Location accuracy is normally measured as a root mean square error (RMSE). The 
RMSE is similar to, but not to be confused with, the standard deviation of a 
statistical sample. The value of the RMSE is normally calculated from a set of 
check measurements (coordinate values from an independent source of higher 
accuracy tor identical points). The differences at each point can be plotted as 
error vectors, as is done in Eigure 5.3 for a single measurement. The error vector 
can be seen as having constituents in the x- and ^/-directions, which can be re¬ 
combined by vector addition to give the error vector representing its locational 
error. 



Figure 5.3: The positional 
error of a measurement 
can be expressed as a 
vector, which in turn can 
be viewed as the vector 
addition of its constituents 
in X- and y-direction, re¬ 
spectively 6x and 6y. 


Eor each checkpoint, the error vector has components 6x and 6y. The observed 
errors should be checked for a systematic error component, which may indicate 
a (possibly repairable) lapse in the measurement method. Systematic error has 
occurred when ^ 0 or 

The systematic error <5^ in a: is then defined as the average deviation from the 
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true value: 


1 

6x = — y^ 6xi. 


Analogously to the calculation of the variance and standard deviation of a 
statistical sample, the root mean square errors rux and ruy of a series of 
coordinate measurements are calculated as the square root of the average 
squared deviations: 


= 


\ 


n 


and 


= 


2 = 1 




where 6x‘^ stands for 6x ■ 5x. The total RMSE is obtained with the formula 


rritotai = + 

which, by the Pythagorean rule, is the length of the average (root squared) 
vector. 
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Accuracy tolerances 


Many kinds of measurement can be naturally represented by a bell-shaped prob¬ 
ability density function p, as depicted in Figure 5.4(a). This function is known 

as the normal (or Gaussian) distribution of a continuous, random variable, in the Distribution of errors 

figure indicated as Y. It shape is determined by two parameters: p, which is the 
mean expected value for Y, and a which is the standard deviation of y. A small 
a leads to a more attenuated bell shape. 




Figure 5.4: (a) Probability 
density function p of a vari¬ 
able Y, with its mean p 
and standard deviation a. 
(b) The probability that Y 
(b) is in the range [p-cT, p-trr]. 


Any probability density function p has the characteristic that the area between its 
curve and the horizontal axis has size 1. Probabilities P can be inferred from p as 
the size of an area under p's curve. Figure 5.4(b), for instance, depicts P(p — a < 
Y < p — cr), i.e. the probability that the value for Y is within distance a from p. 
In a normal distribution this specific probability for Y is always 0.6826. 

The RMSE can be used to assess the probability that a particular set of mea¬ 
surements does not deviate too much from, i.e. is within a certain range of, the 
'true' value. In the case of coordinates, the probability density function often is 
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considered to be that of a two-dimensional normally distributed variable (see 
Figure 5.5). The three standard probability values associated with this distribu¬ 
tion are: 

• 0.50 for a circle with a radius of 1.1774 rrix around the mean (known as the 
circular error probable, CEP); 

• 0.6321 for a circle with a radius of 1.412 around the mean (known as the 
root mean square error, RMSE); 

• 0.90 for a circle with a radius of 2.146 around the mean (known as the 
circular map accuracy standard, CMAS). 



Figure 5.5: Probability 

density p of a normally dis¬ 
tributed, two-dimensional 
variable {X, Y) (also 
known as a normal, bi¬ 
variate distribution). In 
the ground plane, from 
inside out, are indicated 
the circles respectively 
associated with CEP, 
RMSE and CMAS. 


The RMSE provides an estimate of the spread of a series of measurements around 
their (assumed) 'true' values. It is therefore commonly used to assess the quality 
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of transformations such as the absolute orientation of photogrammetric models 
or the spatial referencing of satellite imagery. The RMSE also forms the basis 
of various statements for reporting and verifying compliance with defined map 
accuracy tolerances. An example is the American National Map Accuracy Stan¬ 
dard, which states that; 

"No more than 10% of well-defined points on maps of 1:20,000 scale or 
greater may he in error by more than 1/30 inch." 

Normally, compliance to this tolerance is based on at least 20 well-defined check¬ 
points. 
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The epsilon band 

As a line is composed of an infinite number of points, confidence limits can be 
described by a so-called epsilon (e) or Perkal band at a fixed distance on either 
side of the line (Figure 5.6). The width of the band is based on an estimate of the 
probable location error of the line, tor example to reflect the accuracy of manual 
digitizing. The epsilon band may be used as a simple means for assessing the 
likelihood that a point receives the correct attribute value (Figure 5.7). 



Figure 5.6: The e- or 
Perkal band is formed by 
rolling an imaginary circle 
of a given radius along a 
line. 



Figure 5.7: The e-band 
may be used to assess the 
likelihood that a point falls 
within a particular polygon. 
Source: [43]. Point 3 is 
less likely part of the mid¬ 
dle polygon than point 2. 
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Describing natural uncertainty in spatial data 

There are many situations, particularly in surveys of natural resources, where, 
according to Burrough, "practical scientists, faced with the problem of dividing 
up undividable complex conttnua have often imposed their own crisp structures 

on the raw data" [10, p. 16]. In practice, the results of classification are normally Classification 

combined with other categorical layers and continuous field data to identify, 
for example, areas suitable for a particular land use. In a GIS, this is normally 
achieved by overlaying the appropriate layers using logical operators. 

Particularly in natural resource maps, the boundaries between units may not 
actually exist as lines but only as transition zones, across which one area contin¬ 
uously merges into another. In these circumstances, rigid measures of positional Boundaries 

accuracy, such as RMSE (Figure 5.3), may be virtually insignificant in compari¬ 
son to the uncertainty inherent in vegetation and soil boundaries, for example. 

In conventional applications of the error matrix to assess the quality of nomi¬ 
nal (categorical) data such as land use, individual samples can be considered in 
terms of Boolean set theory. The Boolean membership function is binary, i.e. an 
element is either member of the set (membership is true) or it is not member 
of the set (membership is false). Such a membership notion is well-suited to 
the description of spatial features such as land parcels where no ambiguity is in¬ 
volved and an individual ground truth sample can be judged to be either correct Membership functions 

or incorrect. As Burrough notes, "increasingly, people are begiiming to realize 
that the fundamental axioms of simple binary logic present limits to the way we 
think about the world. Not only in everyday situations, but also in formalized 
thought, it is necessary to be able to deal with concepts that are not necessarily 
true or false, but that operate somewhere in between." 
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Since its original development by Zadeh [58], there has been considerable dis¬ 
cussion of fuzzy, or continuous, set theory as an approach for handling imprecise 
spatial data. In GIS, fuzzy set theory appears to have two particular benefits: 

1. The ability to handle logical modelling (map overlay) operations on inexact 
data, and 

2. The possibility of using a variety of natural language expressions to qualify 
uncertainty. 


Fuzzy set theory 


Unlike Boolean sets, fuzzy or continuous sets have a membership function, which 
can assign to a member any value between 0 and 1 (see Figure 5.8). The member¬ 
ship function of the Boolean set of Figure 5.8(a) can be defined as MF^ follows: 

MF«(.) = I i 

10 otherwise 

The crisp and uncertain set membership functions of Figure 5.8 are illustrated 
for the one-dimensional case. Obviously, in spatial applications of fuzzy set 
techniques we typically would use two-dimensional sets (and membership func¬ 
tions). 

The continuous membership function of Figure 5.8(b), in contrast to function 
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MF^ above, can be defined as a function MF*", following Heuvelink in [21]: 



it X < bi 

if 6 i < X < 62 
if X > 62 


The parameters di and d 2 denote the width of the transition zone around the 
kernel of the class such that MF'^(x) = 0.5 at the thresholds 61 — di and 62 + ^ 2 , 
respectively. If di and d 2 are both zero, the function MF*" reduces to MF^. 

An advantage of fuzzy set theory is that it permits the use of natural language 
to describe uncertainty, for example, "near," "east of" and "about 23 km from," 
as such natural language expressions can be more faithfully represented by ap¬ 
propriately chosen membership functions. 


(a) 


MF 


1.0 


0.5 


0.0 



X 



Figure 5.8: (a) Crisp 

(Boolean) and (b) uncer¬ 
tain (fuzzy) membership 
functions MF. After 
Heuvelink [21] 
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5.2.3 Attribute accuracy 

We can identify two types ot attribute accuracies. These relate to the type of data 
we are dealing with: 

• For nominal or categorical data, the accuracy of labeling (for example the 
type of land cover, road surface, etc). 

• For numerical data, numerical accuracy (such as the concentration of pollu¬ 
tants in the soil, height of trees in forests, etc). 

It follows that depending on the data type, assessment of attribute accuracy may 
range from a simple check on the labelling of features—for example, is a road 
classified as a metalled road actually surfaced or not?—to complex statistical 
procedures for assessing the accuracy of numerical data, such as the percentage 
of pollutants present in the soil. 

When spatial data are collected in the field, it is relatively easy to check on the 
appropriate feature labels. In the case ot remotely sensed data, however, consid¬ 
erable effort may be required to assess the accuracy of the classification proce¬ 
dures. This is usually done by means of checks at a number of sample points. 
The field data are then used to construct an error matrix (also known as a confu¬ 
sion or misclassification matrix) that can be used to evaluate the accuracy of the 
classification. An example is provided in Table 5.1, where three land use types 
are identified. For 62 check points that are forest, the classified image identifies 
them as forest. However, two forest check points are classified in the image as 
agriculture. Vice versa, five agriculture points are classified as forest. Observe 
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that correct classifications are found on the main diagonal of the matrix, which 
sums up to 92 correctly classified points out of 100 in total. 


Classified image 

Reference data 

Forest Agriculture Urban 

total 

Forest 

62 

5 

0 

67 

Agriculture 

2 

18 

0 

20 

Urban 

0 

1 

12 

13 

total 

64 

24 

12 

100 


Table 5.1: Example of a 
simple error matrix for as¬ 
sessing map attribute ac¬ 
curacy. The overall accu¬ 
racy is (62-M8-hl2)/100 = 
92%. 


For more details on attribute accuracy, the student is referred to Principles of 
Remote Sensing [53]. 
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5.2.4 Temporal accuracy 

As noted, the amount of spatial data sets and archived remotely sensed data 
has increased enormously over the last decade. These data can provide useful 
temporal information such as changes in land ownership and the monitoring of 
environmental processes such as deforestation. Analogous to its positional and 
attribute components, the quality of spatial data may also be assessed in terms of 
its temporal accuracy. For a static feature this refers to the difference in the values 
of its coordinates at two different times. 

This includes not only the accuracy and precision of time measurements (for 
example, the date of a survey), but also the temporal consistency of different 

data sets. Because the positional and attribute components of spatial data may Consistency and validity 
change together or independently, it is also necessary to consider their temporal 
validity. For example, the boundaries of a land parcel may remain fixed over 
a period of many years whereas the ownership attribute may change more fre¬ 
quently. 
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5.2.5 Lineage 

Lineage describes the history of a data set. In the case of published maps, some 
lineage information may be provided as part of the metadata, in the form of a 
note on the data sources and procedures used in the compilation of the data. 
Examples include the date and scale of aerial photography, and the date of field 
verification. Especially for digital data sets, however, lineage may be defined 
more formally as: 


"that part of the data quality statement that contains information that de¬ 
scribes the source of observations or materials, data acquisition and compi¬ 
lation methods, conversions, transformations, analyses and derivations that 
the data has been subjected to, and the assumptions and criteria applied at 
any stage of its life." [14] 


All of these aspects affect other aspects of quality, such as positional accuracy. 
Clearly, if no lineage information is available, it is not possible to adequately 
evaluate the quality of a data set in terms of 'fitness for use'. 
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5.2.6 Completeness 

Completeness refers to whether there are data lacking in the database compared 
to what exists in the real world. Essentially, it is important to be able to assess 

what does and what does not belong to a complete dataset as intended by its Incomplete and 

producer. It might be incomplete (i.e. it is 'missing' features which exist in the overcomplete 

real world), or overcomplete (i.e. it contains 'extra' features which do not belong 
within the scope of the data set as it is defined). 

Completeness can relate to either spatial, temporal, or thematic aspects of a data 
set. For example, a data set of property boundaries might be spatially incom¬ 
plete because it contains only 10 out of 12 suburbs; it might be temporally incom¬ 
plete because it does not include recently subdivided properties; and it might be 
thematically overcomplete because it also includes building footprints. 
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5.2.7 Logical consistency 

For any particular application, (predefined) logical rules concern: 

• The compatibility of data with other data in a data set (e.g. in terms of data 
format), 

• The absence of any contradictions within a data set, 

• The topological consistency of the data set, and 

• The allowed attribute value ranges, as well as combinations of attributes. 
For example, attribute values for population, area, and population density 
must agree for all entities in the database. 

The absence of any inconsistencies does not necessarily imply that the data are 
accurate. 
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5.3 Data preparation 


Spatial data preparation aims to make the acquired spatial data fit for use. Im¬ 
ages may require enhancements and corrections of the classification scheme of 
the data. Vector data also may require editing, such as the trimming of over¬ 
shoots of lines at intersections, deleting duplicate lines, closing gaps in lines, 
and generating polygons. Data may require conversion to either vector format 
or raster format to match other data sets which will be used in the analysis. Ad¬ 
ditionally, the data preparation process includes associating attribute data with 
the spatial features through either manual input or reading digital attribute files 
into the GIS/DBMS. 

The intended use of the acquired spatial data may require only a subset of the 

original data set, as only some of the features are relevant for subsequent anal- Intended use 

ysis or subsequent map production. In these cases, data and/or cartographic 
generalization can performed on the original data set. 
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5.3.1 Data checks and repairs 

Acquired data sets must be checked for quality in terms of the accuracy, con¬ 
sistency and completeness parameters discussed above. Often, errors can be Automatic and manual 
identified automafically, affer which manual edifing methods can be applied to checking 

correct the errors. Alternatively, some software may identify and automafically 
correct certain types of errors. Below, we focus on the geometric, topological, and 
attribute components of spatial data. 

'Clean-up' operations are often performed in a standard sequence. For example, 
crossing lines are split before dangling lines are erased, and nodes are created at 
intersections before polygons are generated. These are illustrated in Table 5.2. 

With polygon data, one usually starts with many polylines, in an unwieldy for¬ 
mat known as spaghetti data, that are combined in the first step (from Figure 5.9(a) 
to (b)). This results in fewer polylines with more internal vertices. Then, poly¬ 
gons can be identified (c). Sometimes, polylines that should cormect to form 
closed boundaries do not, and therefore must be coimected (either manually or 
automatically); this step is not indicated in the figure. In a final step, the elemen¬ 
tary topology of the polygons can be derived (d). 
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Before cleanup After cleanup Description Before cleanup After cleanup Description 



Table 5.2: Clean-up oper¬ 
ations for vector data 
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(b) Spaghetti data (cleaned) 



Figure 5.9: 

clean-up 
for vector 
ing spaghetti data 
topological structure. 


Successive 
operations 
data, turn- 
into 
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Associating attributes 

Attributes may be automatically associated with features that have unique iden¬ 
tifiers. We have already discussed these techniques in Section 3.5. In the case of 
vector data, attributes are assigned directly to the features, while in a raster the 
attributes are assigned to all cells that represent a feature. Section 5.2.3 discusses 
issues relating to raster attribute accuracy in more detail. 
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Rasterization or vectorization 

Vectorization produces a vector data set from a raster. We have looked at this in 
some sense already: namely in the production of a vector set from a scarmed 
image. Another form of vectorization takes place when we want to identify 
features or patterns in remotely sensed imagery. The keywords here are fea¬ 
ture extraction and pattern recognition, which are dealt with in Principles of Remote 
Sensing [53]. 

If much or all of the subsequent spatial data analysis is to be carried out on 
raster data, one may want to convert vector data sets to raster data. This process 
is known as rasterization. It involves assigning point, line and polygon attribute 
values to raster cells that overlap with the respective point, line or polygon. To 
avoid information loss, the raster resolution should be carefully chosen on the 
basis of the geometric resolution. A cell size which is too large may result in 
cells that cover parts of multiple vector features, and then ambiguity arises as to 
what value to assign to the cell. If, on the other hand, the cell size is too small, 
the file size of the raster may increase significantly. 

Rasterization itself could be seen as a 'backwards step': firstly, raster boundaries 
are only an approximation of the objects' original boundary. Secondly, the origi¬ 
nal 'objects' can no longer be treated as such, as they have lost their topological 
properties. Often the reason for rasterisation is because it facilitates easier com¬ 
bination with other data sources also in raster formats, and/or because there 
are several analytical techniques which are easier to perform upon raster data 
(please refer to Chapter 6). An alternative to rasterization is to not perform it 
during the data preparation phase, but to use CIS rasterization functions on- 
the-fly, that is when the computations call for it. This allows keeping the vector 
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data and generating raster data from them when needed. Obviously, the issue 
of performance trade-off must be looked into. 
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Topology generation 

We have already discussed derivation of topology from vectorized data sources. 

However, more topological relations may sometimes be needed, for instance in 
networks, e.g. the questions of line coimectivity, flow direction, and which lines 

have over- and underpasses. For polygons, questions that may arise involve What kind of topology is 
polygon inclusion: Is a polygon inside another one, or is the outer polygon sim- required? 

ply around the inner polygon? Many of these questions are mostly questions of 
data semantics, and can therefore usually only be answered by a human opera¬ 
tor. 
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5.3.2 Combining data from multiple sources 

A GIS project usually involves multiple data sets, so the next step addresses the 
issue of how these multiple sets relate to each other. There are four fundamenfal 
cases to be considered in the combination of dafa from differenf sources: 


1. They may be about the same area, but differ in accuracy, 

2. They may be about the same area, but differ in choice of representation, 

3. They may be abouf adjacenf areas, and have fo be merged into a single data 
set. 

4. They may be about the same or adjacent areas, but referenced in differenf 
coordinate systems. 

We look af these situations below. They are best understood with an example. 
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Differences in accuracy 


Issues relating to positional error were outlined in Section 5.2.2, while attribute 
accuracy and temporal accuracy issues were discussed in Sections 5.2.3 and 5.2.4 
respectively. These are clearly relevant in any combination of data sets which 
may themselves have varying levels of accuracy. 

Images come at a certain resolution, and paper maps at a certain scale. This 
typically results in differences of resolution of acquired data sets, all the more 
since map features are sometimes intentionally displaced to improve readability 
of the map. For instance, the course of a river will only be approximated roughly 

on a small-scale map, and a village on its northern bank should be depicted Scale 

north of the river, even if this means it has to be displaced on the map a little 

bit. The small scale causes an accuracy error. If we want to combine a digitized 

version of that map, with a digitized version of a large-scale map, we must be 

aware that features may not be where they seem to be. Analogous examples can 

be given for images at different resolutions. 



Figure 5.10; The integra¬ 
tion of two vector data 
sets, which represent the 
same phenomenon, may 
lead to sliver polygons 


In Figure 5.10, the polygons of two digitized maps at different scales are over- 
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laid. Due to scale differences in the sources, the resulting polygons do not per¬ 
fectly coincide, and polygon boundaries cross each other. This causes small, 

artefact polygons in the overlay known as sliver polygons. If the map scales in- Sliver polygons 

volved differ significantly, the polygon boundaries of the large-scale map should 
probably take priority, but when the differences are slight, we need interactive 
techniques to resolve the issues. 

There can be good reasons for having data sets at different scales. A good ex¬ 
ample is found in mapping organizations; European organizations maintain a 
single source database that contains the base data. This database is essentially 
scale-less and contains all data required for even the largest scale map to be pro¬ 
duced. For each map scale that the mapping organization produces, they derive Foundation or base data 
a separate database from the foundation data. Such a derived database may be 
called a cartographic database as the data stored are elements to be printed on a 
map, including, for instance, data on where to place name tags, and what colour 
to give them. This may mean the organization has one database for the larger 
scale ranges (1:5,000-1:10,000) and other databases for the smaller scale ranges. 

They maintain a multi-scale data environment. 
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Differences in representation 

We have already talked about the various ways to represent spatial data. Some¬ 
times data is acquired as point samples or observations, other times it is in the 
form of polygons with attribute data. When points need to be translated into 
rasters, we need to perform something known as point data transformation, which 
is discussed in Section 5.4. 

Some advanced GIS applications require the possibility of representing the same 
geographic phenomenon in different ways. These are called multirepresentation 
systems. The production of maps at various scales is an example, but there 

are numerous others. The commonality is that phenomena must sometimes be Multi-scale and 

viewed as points, and at other times as polygons. For example, a small-scale rnultirepresentation systems 

national road network analysis may represent villages as point objects, but a 

nation-wide urban population density study should regard all municipalities as 

represented by polygons. The complexity that this requirement entails is that 

the GIS or the DBMS must keep track of links between different representations 

for the same phenomenon, and must also provide support for decisions as to 

which representations to use in which situation. 

The links between various representations for the same object maintained by the 
system allows switching between them, and many fancy applications of their 
use seem possible. A comparison is illustrated in Figure 5.11. 
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multi-representation 
(according to scale) 


Figure 5.11 : Multi-scale 
and multi-representation 
systems compared; the 
main difference is that 
multi-representation 
systems have a built-in 
‘understanding’ that dif¬ 
ferent representations 
belong together. 
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Merging data sets of adjacent areas 

When individual data sets have been prepared as described above, they some¬ 
times have to be matched into a single 'seamless' data set, whilst ensuring that 

the appearance of the integrated geometry is as homogeneous as possible. Edge Edge matching 

matching is the process of joining two or more map sheets, for instance, after they 
have separately been digitized. 



Figure 5.12: Multiple ad¬ 
jacent data sets, after 
cleaning, can be matched 
and merged into a single 
one. 


Merging adjacent data sets can be a major problem. Some GIS functions, such 
as line smoothing and data clean-up (removing duplicate lines) may have to be 
performed. Figure 5.12 illustrates a typical situation. Some GISs have merge 
or edge-matching functions to solve the problem arising from merging adjacent 
data. At the map sheet edges, feature representations have to be matched in 
order for them to be combined. Goordinates of the objects along shared borders 
are adjusted to match those in the neighbouring data sets. Mismatches may still 
occur, so a visual check, and interactive editing is likely to be required. 
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Differences in coordinate systems 

Chapter 4 provided an introduction to coordinate systems, datums and map pro¬ 
jections. Map projections provide means to map geographic coordinates onto a 
flat surface (for map production), and vice versa. It may be the case that data 
layers which are to be combined or merged in some way are referenced in dif¬ 
ferent coordinate systems, or are based upon different datums. As a result, data Transformations 

may need coordinate transformation (Figure 4.22), or both a coordinate trans¬ 
formation and datum transformation (Figure 4.23). It may also be the case that 
data has been digitized from an existing map or data layer (Section 5.1.2). In this 
case, geometric transformations help to transform device coordinates (coordinates 
from digitizing tablets or screen coordinates) into world coordinates (geographic 
coordinates, meters, etc.). 
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Other data preparation functions 

A range of other data preparation functions exist that support conversion or 
adjustment of the acquired data to format requirements that have been defined 
for data storage purposes. These include: 

• Format transformation functions. These convert between data formats of dif¬ 
ferent systems or representations, e.g. reading a DXF file into a GIS. Al¬ 
though we will not focus on the technicalities here, the user should be 
warned that conversions from one format to another may cause problems. The 
reason is that not all formats can capture the same information, and there¬ 
fore conversions often mean loss of information. If one obtains a spatial 
data set in format F, but needs it in format G (for instance because the 
locally preferred GIS package requires it), then usually a conversion func¬ 
tion can be found, often within the same GIS software package. The key to 
successful conversion is to also find an inverse conversion, back from G to 
F, and to ascertain whether the double conversion back to F results in the 
same data set as the original. If this is the case, both conversions are not 
causing information loss, and can safely be applied. 

• Graphic element editing. Manual editing of digitized features so as to correct 
errors, and to prepare a clean data set for topology building. 

• Coordinate thinning. A process that is often applied to remove redundant 
or excess vertices from line representations, as obtained from digitizing. 
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5.4 Point data transformation 


This section looks at several methods of transforming point data in a GIS. We 
may have captured a sample of points (or acquired a dataset of such points), but 
wish to derive a value for the phenomenon at another location or for the whole 
extent of our study area. 

We may want to transform our points into other representations in order to 
facilitate interpretation and/or integration with other data. Examples include 
defining homogeneous areas (polygons) from our point data, or deriving con¬ 
tour lines. This is generally referred to as interpolation, i.e. the calculation of a 
value from 'surrounding' observations. The principle of spatial autocorrelation 
plays a central part in the process of interpolation (see Section 2.3). 

In order to predict the value of a point for a given (x, y) location, we could sim¬ 
ply find the 'nearest' known value to the point, and assign that value. This 
is the simplest form of interpolation, known as nearest-neighbour interpolation. 
We might instead choose to use the distance that points are away from (x, y) to 
weight their importance in our calculation. 

In some instances we may be dealing with a data type that limits the type of 
interpolation we can do (refer to page 75 for a brief background). A fundamen¬ 
tal issue in this respect is what kind of phenomena we are considering: is it a 
discrete field—such as geological units, for instance—in which the values are of 
a qualitative nature and the data is categorical, or is it a continuous field—like 
elevation, temperature, or salinity— in which the values are of a quantitative 
nature, and represented as continuous measurements? This distinction matters. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 


Interpolation 


Data type 





















5.4. Point data transformation 


321 


because we are limited to nearest-neighbour interpolation for discrete data.^ 




(b) 


Figure 5.13: A geographic 
field representation ob¬ 
tained from two point 
measurements: (a) for 

qualitative (categorical), 
and (b) for quantitative 
(continuous) point mea¬ 
surements. The value 
measured at P is repre¬ 
sented as dark green, that 
at Q as light green. 


A simple example is given in Figure 5.13. Our field survey has taken only two 
measurements, one at P and one at Q. The values obtained in these two locations 
are represented by a dark and light green tint, respectively. If we are dealing 
with qualitative data, and we have no further knowledge, the only assumption 
we can make for other locations is that those nearer to P probably have P's 
value, whereas those nearer to Q have Q's value. This is illustrated in part (a). 

If, on the contrary, our field is quantitative, we can let the values of P and Q both 
contribute to values for other locations. This is done in part (b) of the figure. 
To what extent the measurements contribute is determined by the interpolation 
function. In the figure, the contribution is expressed in terms of the ratio of 
distances to P and Q. We will see in the sequel that the choice of interpolation 
function is a crucial factor in any method of point data transformation. 

^Please refer to Section 2.2.3 for a background discussion of both discrete and continuous 
fields. 
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How we represent a field constructed from point measurements in the GIS also 
depends on the above distinction. A discrete field can either be represented as 
a classified raster or as a polygon data layer, in which each polygon has been 

assigned a (constant) field value. A continuous field can be represented as an un- Discrete and continuous 
classified raster, as an isoline (thus, vector) data layer, or perhaps as a TIN. Some fields 

GIS software only provide the option of generating raster output, requiring an 
intermediate step of raster to vector conversion. The choice of representation 
depends on what will be done with the data in the analysis phase. 
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5.4.1 Interpolating discrete data 

If we are dealing with discrete (nominal, categorical or ordinal) data, we are 
effectively restricted to using nearest-neighbour interpolation. This is the situa¬ 
tion shown in Figure 5.13(a), though usually we would have many more points. 
In a nearest-neighbour interpolation, each location is assigned the value of the 
closest measured point. Effectively, this technique will construct 'zones' around 
the points of measurement, with each point belonging to a zone assigned the 
same value. Effectively, this represents an assignment of an existing value (or 
category) to a location. 

If the desired output was a polygon layer, we could construct Thiessen polygons 
around the points of measurement. The boundaries of such polygons, by defi¬ 
nition, are the locations for which more than one point of measurement is the 
closest point. An illustration is provided is Figure 5.14. Thiessen polygons 
are further discussed on page 398. If the desired output was in the form of a 
raster layer, we could rasterize the Thiessen polygons. This was discussed in 
Section 5.3.1. 
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previous 



Figure 5.14: Generation 
of Thiessen polygons for 
qualitative point measure¬ 
ments. The measured 
points are indicated in 
dark green; the darker 
area indicates all locations 
assigned with the mea¬ 
surement value of the cen¬ 
tral point. 
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5.4.2 Interpolating continuous data 

Interpolation of values from continuous measurements is significantly more com¬ 
plex. This is the situation of Figure 5.13(b), but again, usually with many more 
point measurements. 

Since the data are continuous, we can make use of measured values for infer- 
polation. There are many continuous geographic fields—elevafion, femperafure 
and ground wafer salinity are just a few examples. Commonly, continuous fields 
are represenfed as rasfers, and we will almosf by defaulf assume fhaf they are. 
Alternatives exist though, as we have seen in discussions in Chapter 2. The main 
alternative for continuous field represenfation is a polyline vector layer, in which 
the lines are isolines. We will also address these issues of representation below. 

The aim is to use measurements to obtain a representation of the entire field 
using point samples. In this section we outline four techniques to do so: 

1. Trend surface tiffing using regression, 

2. Triangulation, 

3. Spatial moving averages using inverse distance weighting, 

4. Kriging. 
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Trend surface fitting 

In trend surface fitting, the assumption is that the entire study area can be repre¬ 
sented by a formula f{x, y) that for a given location with coordinates (x, y) will 
give us the approximated value of the field in that location. 

The key objective in trend surface fitting is to derive a formula that best describes 
the field. Various classes of formulee exist, with the simplest being the one that 
describes a flat, but tilted plane: 

f{x,y) = ci-x + C 2 -y + c^. 

If we believe—and this judgement must be based on domain expertise—that 
the field under consideration can be best approximated by a tilted plane, then 
the problem of finding the best plane is the problem of determining best values 
for the coefficients ci, C 2 and C 3 . This is where the point measurements earlier 
obtained become important. Statistical techniques known as regression techniques 
can be used to determine values for these coefficients Cj that best fit with the 
measurements. A plane will be fitted through the measurements that makes the 
smallest overall error with respect to the original measurements. 

In Figure 5.15, we have used the same set of point measurements, with four 
different approximation functions. Part (a) has been determined under the as¬ 
sumption that the field can be approximated by a tilted plane, in this case with 
a downward slope to the southeast. The values found by regression techniques 
were: Ci = —1.83934, C 2 = 1.61645 and C 3 = 70.8782, giving us: 

f{x, y) = -1.83934 ■ x 4 - 1.61645 ■ y + 70.8782. 
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(b) 



Figure 5.15: Various 

global trend surfaces 
obtained from regression 
techniques: (a) simple 

tilted plane; (b) bilinear 
saddle; (c) quadratic 
surface; (d) cubic surface. 
Values range from white 
(low), via blue, and light 
green to dark green (high). 
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Clearly, not all fields are representable as simple, tilted planes. Sometimes, the 
theory of the application domain will dictate that the best approximation of the 
field is a more complicated, higher-order polynomial function. Three such func¬ 
tions were the basis for the fields illustrated in Figure 5.15(b)-(d). 

The simplest extension from a tilted plane, that of bilinear saddle, expresses some 
dependency between the x and y dimensions: 

f{x, y) = ci-x + C 2 -y + cs-xy + C 4 . 

This is illustrated in part (b). A further step up the ladder of complexity is to 
consider quadratic surfaces, described by: 

f{x, y) = Cl ■ x'^ + C2 ■ X + C3 ■ y^ + C4 ■ y + C5 ■ xy + cq. 

The objective is to find six values for our coefficients that best match with the 
measurements. A bilinear saddle and a quadratic surface have been fitted through 
our measurements in Figure 5.15(b) and (c), respectively. 

Part (d) of the figure illustrates the most complex formula of the surfaces in 
Figure 5.15, the cubic surface. It is characterized by the following formula: 


f{x, y) = Cl ■ X^ + C 2 ■ x"^ + C 3 ■ X + 

C4-y^ + C5-y^ + Ce-y + 

cj ■ x'^y -F Cg • xy‘^ + Cg ■ xy + Cio- 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















5.4. Point data transformation 


329 


The regression techniques applied for Figure 5.15 determined the following val¬ 
ues for the coefficients q: 


Fig 5.15 

Cl 

C2 

C3 

C4 

C5 

C6 

c? 

C8 

CQ 

CIO 

(a) 

-1.83934 

1.61645 

70.8782 








(b) 

-5.61587 

-2.95355 

0.993638 

89.0418 







(c) 

0.000921084 

-5.02674 

-1.34779 

7.23557 

0.813177 

76.9177 





(d) 

-0.473086 

6.88096 

31.5966 

-0.233619 

1.48351 

-2.52571 

-0.115743 

-0.052568 

2.16927 

96.8207 


Trend surface fitting is a useful technique of continuous field approximation, 
though determining the 'best fit' values for the coefficients q is a time-consuming 
operation, especially with many point measurements. Once these best values 
have been determined, we know the formula, making it possible to compute an 
approximated value for any location in the study area. 

It is possible to use trend surfaces for both global and local trends. Global trend 
surface fitting is based on the assumption that the entire study area can be ap¬ 
proximated by the same mathematical surface. However in many cases, the 
assumption that a single formula can describe the field for the entire study area 
is an unrealistic one. Capturing all the fluctuation of a natural geographic field 
in a reasonably sized sfudy area, demands polynomials of exfreme orders, and 
these quickly become computationally impossible to decipher. 

It should also be noted that the spatial distribution of sample measures have a 
significant effect on the shape of the fitting function. This is especially true for 
locafions that are within the study area, but outside of the area within which the 
measurements fall. These may be subject to a so-called edge effect, meaning that 
the values obtained from the approximation function for edge locations may be 
rather nonsensical. The reader is asked to judge whether such edge effects have 
taken place in Figure 5.15. For these reasons, it is often useful to partition the 
study area into parts that may actually be polynomially approximated. The de- 
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cision of how to partition the study area must be taken with care, and must be 
guided by domain expertise. Once we have identified the parts, we may apply 
the trend surface fitting techniques discussed earlier, and obtain an approxima¬ 
tion polynomial for each part. 

Local trend surface fitting is not a popular technique in practical applications, 
because they are relatively difficult to implement, and other techniques such 
as moving windows are better for the representation and identification of local 
trends. 

If we know the polynomial, it is relatively simple to generate a raster layer, given 
an appropriate cell resolution and an approximation function for the cell's value. 

In some cases it is more accurate to assign the average of the computed values for 

all of the cell's corner points. In order to generate a vector layer representing this Generating trend surfaces 
data, isolines can be derived, tor a given set of intervals. The specific techniques 
of generating isolines are not discussed here, however, triangulation techniques 
discussed below can play a role. 
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Triangulation 


Another way of interpolating point measurements is by triangulation. Triangu¬ 
lated Irregular Networks (TlNs) have already been discussed in some detail in 
Section 2.3.3. Essentially, this technique constructs a triangulation of the study 
area from the known measurement points. Preferably, the triangulation should 
be a Delaunay triangulation. ^ After having obtained it, we may define for which 

values of the field we want to construct isolines. For instance, for elevation, we TINs and isolines 

might want to have the 100 m-isoline, the 200 m-isoline, and so on. For each edge 
of a triangle, a geometric computation can be performed that indicates which 
isolines intersect it, and at what positions they do so. A list of computed loca¬ 
tions, all at the same field value, is used by the GIS to construct the isoline. This 
is illustrated in Figure 5.16. 

Figure 5.16: Triangulation 
as a means of interpo¬ 
lation. (a) known point 
measurements; (b) con¬ 
structed triangulation on 
known points; (c) isolines 
constructed from the trian¬ 
gulation. 




^For more information on this t 5 rpe of triangulation, see Section 6.4.1 
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Moving averages using inverse distance weighting (IDW) 

Moving window averaging attempts to directly derive a raster dataset from a set 
of sample points. This is why it is sometimes also called 'gridding'. The principle 
behind this technique is illustrated in Figure 5.17. The cell values for the output 
raster are computed one by one. To achieve this, a 'window' (also known as a 

kernel) is defined, and initially placed over the top left raster cell. Measurement Moving window averaging 

points falling inside the window contribute to the averaging computation, those 

outside the window do not. This is why moving window averaging is said to 

be a local interpolation method. After the cell value is computed and assigned 

to the cell, the window is moved one cell to the right, and the computations are 

performed for that cell. Successively, all cells of the raster are visited in this way. 

Figure 5.17: The princi¬ 
ple of moving window av¬ 
eraging. In blue, the mea¬ 
surement points. A vir¬ 
tual window is moved over 
the raster cells one by one, 
and some averaging func¬ 
tion computes a field value 
for the cell, using mea¬ 
surements within the win- 
(a) ^-2 0 2 4 6 8 10 12 ^-2 0 2 4 6 8 10 12 (b) dow. 

In part (b) of the figure, the 295th cell value out of the 418 in total, is being 
computed. This computation is based on eleven measurements, while that of 
the first cell had no measurements available. Where this is the case, the cell 
should be assigned a value that signals this 'non-availability of measurements'. 
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Suppose there are n measurements selected in a window, and that a measure¬ 
ment is denoted as m,. The simplest averaging function will compute the arith¬ 
metic mean, treating all measurements equally: 

1 

n "3 


The principle of spatial autocorrelation suggests that measurements closer to the 
cell centre should have greater influence on the predicted value than those fur¬ 
ther away. In order to account for this, a distance factor can be brought into the Weighted distance functions 
averaging function. Functions that do this are called inverse distance weighting 
functions (IDW). This is one of the most commonly used functions in interpolat¬ 
ing spatial data. 

Let us assume that the distance from measurement point i to the cell centre is de¬ 
noted by di. Commonly, the weight factor applied in inverse distance weighting 
is the distance squared, but in the general case the formula is: 


e5/e 

i=l “i i=l 


1 


Moving window averaging has many parameters. As experimentation with any 
CIS package will demonstrate, picking the right parameter settings may make 
quite a difference for the resulting raster. We discuss some key parameters be¬ 
low. 
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14 


Figure 5.18: Inverse dis¬ 
tance weighting as an av¬ 
eraging technique. In 
green, the (circular) mov¬ 
ing window and its cen¬ 
tre. In blue, the measure¬ 
ment points with their val¬ 
ues, and distances to the 
centre; some are inside, 
some are outside of the 
window. 
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• Raster resolution: Too large a cell size will smooth the function too much, 
removing local variations; too small a cell size will result in large clusters 
of equally valued cells, with little added value. 

• Shape/size of window: Most procedures use square windows, but rectangu¬ 
lar, circular or elliptical windows are also possible. These can be useful 
in cases where the measurement points are distributed regularly at fixed 
distance over the study area, and the window shape must be chosen to en¬ 
sure that each raster cell will have its window include the same number 

of measurement points. The size of the window is another important mat- Key IDW parameters 

ter. Small windows tend to exaggerate local extreme values, while large 
windows have a smoothing effect on the predicted field values. 

• Selection criteria: Nof necessarily all measuremenfs within the window need 
to be used in averaging. We may choose to select use at most five, (nearest) 
measurements, or we may choose to only generate a field value if more 
than three measurements are in the window."^ 

• Averaging function: A final choice is which function is applied to the se¬ 
lected measurements within the window. It is possible to use different 
distance-weighting functions, each of which will influence the calculation 
of the resulting value. 


In many practical cases, one will have to experiment with parameter settings 

slope or direction are important aspects of the field, fhe selection criferia may even be 
sef in a way fo ensure fhis. One fechnique, known as quadrant sector control, implemenfs fhis 
by selecting measuremenfs from each quadranf of fhe window, fo ensure fhaf all directions are 
represenfed in fhe cell's compufed value. 
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to obtain optimal results. When working with time series measurements (mea¬ 
surement sets at different points in time), one should keep the same parameter 
settings between time instants, as otherwise comparisons between fields com¬ 
puted for different moments in time will make little sense. 
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Kriging 

Kriging was originally developed my mining geologists attempting to derive 
accurate estimates of mineral deposits in a given area from limifed sample mea- 
suremenfs. It is an advanced interpolation technique belonging to the field of 
geostatistics, which can deliver good results if applied properly and with enough 
sample points. Kriging is usually used when the variation of an aftribufe and/or 
the density of sample poinfs is such fhaf simple methods of inferpolafion may 
give unreliable predictions. 

Kriging is based on the notion that the spatial change of a variable can be de¬ 
scribed as a function of the distance between points. It is similar to IDW inter¬ 
polation, in that it the surrounding values are weighted to derive a value for 
an unmeasured location. However, the kriging method also looks at the overall 
spatial arrangement of the measured points and the spatial correlation between 
their values, to derive values for an unmeasured location. 

The firsf sfep in the kriging procedure is to compare successive pairs of point 
measurements to generate a semi-variogram. In the second step, the semi-vario- 

gram is used to calculate the weights used in interpolation. Although kriging is Semi-variogram 

a powerful fechnique, it should not be applied without a good understanding of 
geosfafisfics, including fhe principle of spatial autocorrelation. For more detail 
on the various kriging methods, readers are referred to [11]. 
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Discussion 

The interpolation functions discussed above are available in most GISs, though 
each may have slightly different formulations or implementations. These are 
a set of the most commonly used interpolation functions, but by no means the 
only functions that exist. 

It should be noted that there is no single best interpolation method, since each 
method has advantages and disadvantages in particular contexts. As a general 
guide, the following questions should be considered in selecting an appropriate 
method of interpolation: 


• For what type of application will the results be used? 

• What data type is being interpolated (e.g. categorical or continuous)? 

• What is the nature of the surface (for example, is if a 'simple' or complex 
surface)? 

• What is the scale and resolution of the data (for example, fhe distance be¬ 
tween sample points)? 


It is important to carry out an evaluation of the data set before interpolation 
takes place. In such an evaluation, one of the main goals is to establish whether 
there are any existing trends in the data set that may influence inferpolafion. 

Trend surfaces can be tiffed fo the existing data (see page 326), followed by an 

examination of fhe differences befween the existing data and the resulting trend Pre-interpolation checks 
surface. If is also imporfanf fo assess fhe spafial variabilify of the existing data. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















5.4. Point data transformation 


339 


This can be achieved with simple moving window techniques or some other 
kind of linear interpolation. Finally, in order to establish the effect of the inter¬ 
polation parameters on the result, different sets of interpolation parameters can 
be employed, and the results of these compared. 

From the discussions above, an appropriate interpolation method and parame¬ 
ters can be determined. One way to evaluate results is to use an independent 
(reference) dafa set and calculate the difference befween the value from this 

data set and the interpolated surface at each location. However, 'independent' Evaluating interpolation 
datasets do not always exist. In this case, another option is to run a series of results 

interpolations, leaving out one sample point from the original data for each run. 

Again, this makes it possible to compare the results from interpolation with a 
known value. If the differences ('errors') found using this method are unaccept¬ 
able, either there are not enough sample points for an accurate result, or one or 
more of the parameters used for the interpolation is incorrect. 
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Summary 


Digital data can be obtained directly from spatial data providers, or from pre¬ 
existing GIS application projects. A GIS project may also be involved with data 
obtained from ground-based surveying, which obviously have to be entered into 
the system. Sometimes, however, the data must be obtained from non-digital 
sources such as paper maps. In all of these cases, data quality is a key consider¬ 
ation. 

Data cleaning and preparation involves checking for errors, inconsistencies, and 
simplification and merging existing spatial data sets. The problems that one 
may encounter may be caused by differences in resolution and differences in 
representation. We have discussed various methods to address these issues in 
this chapter. 

It is often the case that we have captured a sample of points, but wish to derive 
a value for the phenomenon at another location or for the whole extent of our 
study area. This chapter has discussed a range of point interpolation methods 
which can be used to achieve this. While there is no single best method, key is¬ 
sues to be considered in choosing the appropriate interpolation method include 
the application for which the data will be used, the type of data we are dealing 
with, the nature of the surface which the data is describing, and the scale and 
resolution of the data set. 
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Questions 

1. Data clean-up operations are often executed in a certain order. Why is this? 
Provide a sensible ordering of a number of clean-up operations. 

2. Rasterization of vector data is sometimes required in data preparation. 
What reasons may exist for this? If it is needed, the raster resolution must 
be carefully selected. Argue why. 

3. Take another look at Figure 5.15 and consider the determined values for 
the coefficients in the respective formulae. Make a study of edge effects, 
for instance by computing the approximated field values for the locations 
(-2,10) and (12,10). 

4. Figure 5.18 illustrates the technique of moving window averaging using 
an averaging function that applies inverse distance weighting. What field 
value will be computed for the cell if the averaging function is inverse 
squared distance weighting? 


<J> 

<J> 

<$> 

<$> 
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Chapter 6 

Spatial data analysis 


The discussion up until this point has sought to prepare the reader for the 'data 
analysis' phase. So far, we have discussed the nature of spatial data, georefer- 
encing, notions of data acquisition and preparation, and issues relating to data 
quality and error. 

Before we move on to discuss a range of analytical operations, we should begin 
with some clarifications. We know from preceding discussions that the analyt¬ 
ical capabilities of a GIS use spatial and non-spatial (attribute) data to answer 
questions and solve problems that are of spatial relevance. It is important to 
make a distinction between analysis (or analytical operations) as discussed in 
Section 3.3.3, and analytical models (often just referred fo jusf as 'models'). By 
analysis we mean only a subset of what is usually implied by the term: we do 
not specifically deal with statistical analysis (such as cluster detection, for exam- 
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pie). These are advanced concepts and techniques which are outside the scope 
of this book. 

All knowledge of the world is based on models of some kind - whether they 
are simple abstractions, culturally-based stereotypes or complex equations that 
describe a physical phenomena. We have already seen in Section 1.2.1 that there 
are different types of model, and thaf the word itself means different things in 
different contexts. Section 2.1 noted that even spatial data is itself is a kind of 
'model' of some part of the real world. 

In this chapter we will focus on analyfical functions that can form the build¬ 
ing blocks for application models. It will hopefully become clear to the reader 
that these operations can be combined in various ways for increasingly complex 
analyses. Later in the chapter we present an overview of different types of ana¬ 
lytical models and related concepts of which the user should be aware, as well 
as an examination of how various errors may degrade the results of our models 
or analyses. 
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6.1 Classification of analytical GIS capabilities 


There are many ways to classify the analytical functions of a GIS. The classifi¬ 
cation used for this chapter, is essentially the one put forward by Aronoff [3]. It 
makes the following distinctions, which are addressed in subsequent sections of 
the chapter: 

1. Classification, retrieval, and measurement functions. All functions in this 
category are performed on a single (vector or raster) data layer, often using 
the associated attribute data. 

• Classification allows the assignment of features to a class on the basis 
of attribute values or attribute ranges (definition of data patterns). On 
the basis of reflectance characteristics found in a raster, pixels may be 
classified as representing different crops, such as potato and maize. 

• Retrieval functions allow the selective search of data. We might thus 
retrieve all agricultural fields where potato is grown. 

• Generalization is a function that joins different classes of objects with 
common characteristics to a higher level (generalized) class. ^ For ex- 

^The term generalization has different meanings in different contexts. In geography the term 
'aggregation' is often used to indicate the process that we call generalization. In cartography, 
generalization means either the process of producing a graphic represenfafion of smaller scale 
from a larger scale original {cartographic generalization), or fhe process of deriving a coarser res- 
olufion represenfafion from a more defailed represenfafion wifhin a dafabase (model generaliza¬ 
tion). Finally, in compufer science generalization is one of fhe abstraction mechanisms in objecf- 
orienfafion. 
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ample, we might generalize fields where potato or maize, and possi¬ 
bly other crops, are grown as 'food produce fields'. 

• Measurement functions allow the calculation of distances, lengths, or 
areas. 

More detail can be found in Section 6.2. 

2. Overlay functions. These belong to the most frequently used functions 
in a GIS application. They allow the combination of two (or more) spa¬ 
tial data layers comparing them position by position, and treating areas of 
overlap—and of non-overlap—in distinct ways. Many GISs support over¬ 
lays through an algebraic language, expressing an overlay function as a 
formula in which the data layers are the arguments. In this way, we can 
find 


• The potato fields on clay soils (select the 'potato' cover in the crop 
data layer and the 'clay' cover in the soil data layer and perform an 
intersection of the two areas found), 

• The fields where potato or maize is the crop (select both areas of 
'potato' and 'maize' cover in the crop data layer and take their union), 

• The potato fields not on clay soils (perform a difference operator of 
areas with 'potato' cover with the areas having clay soil), 

• The fields that do not have potato as crop (take the complement of the 
potato areas). 

These are discussed further in Section 6.3. 
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3. Neighbourhood functions. Whereas overlays combine features at the same 
location, neighbourhood functions evaluate the characteristics of an area 
surrounding a feature's location. A neighbourhood function 'scans' the 
neighbourhood of the given feature(s), and performs a computation on it. 

• Search functions allow the retrieval of features that fall within a given 
search window. This window may be a rectangle, circle, or polygon. 

• Buffer zone generation (or buffering) is one of the best known neigh¬ 
bourhood functions. It determines a spatial envelope (buffer) around 
(a) given feature(s). The created buffer may have a fixed width, or a 
variable width that depends on characteristics of the area. 

• Interpolation functions predict unknown values using the known val¬ 
ues at nearby locations. This typically occurs for continuous fields, 
like elevation, when the data actually stored does not provide the di¬ 
rect answer for the location(s) of interest. Interpolation of continuous 
data was discussed in Section 5.4.2. 

• Topographic functions determine characteristics of an area by looking 
at the immediate neighbourhood as well. Typical examples are slope 
computations on digital terrain models (i.e. continuous spatial fields). 
The slope in a location is defined as the plane tangent to the topogra¬ 
phy in that location. Various computations can be performed, such 
as: 

- determination of slope angle, 

- determination of slope aspect, 

- determination of slope length. 
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- determination of contour lines. These are lines that connect points 
with the same value (for elevation, depth, temperature, baromet¬ 
ric pressure, water salinity etc). 

We discuss these topics more fully in Section 6.4. 

4. Connectivity functions. These functions work on the basis of nefworks, 
including road networks, water courses in coastal zones, and communica¬ 
tion lines in mobile telephony. These networks represent spatial linkages 
between features. Main functions of this type include: 

• Contiguity functions evaluate a characteristic of a set of coimected spa¬ 
tial units. One can think of the search for a contiguous area of forest 
of certain size and shape in a satellite image. 

• Network analytic functions are used to compute over cormected line fea¬ 
tures that make up a network. The network may consist of roads, pub¬ 
lic transporf roufes, high volfage lines or other forms of fransporfation 
infrastrucfure. Analysis of such networks may entail shortest path com¬ 
putations (in terms of distance or travel time) between two points in 
a network for routing purposes. Other forms are to find all points 
reachable within a given distance or duration from a start point for 
allocation purposes, or determination of the capacity of the network 
for transportation between an indicated source location and sink lo¬ 
cation. 

• Visibility functions also fit in this list as they are used to compute the 
points visible from a given location (viewshed modelling or viewshed 
mapping) using a digital terrain model. 
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Details are discussed in Section 6.5. 
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6.2 Retrieval, classification and measurement 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 


about 





















6 . 2 . Retrieval, classification and measurement 


350 


6.2.1 Measurement 

Geometric measurement on spatial features includes counting, distance and area 
size computations. For the sake of simplicify, this section discusses such mea¬ 
surements in a planar spatial reference sysfem. We limit ourselves to geometric 

measurements, and do not include attribute data measurement. In general. Measurement types 

measurements on vector data are more advanced, thus, also more complex, than 
those on raster data. We discuss each group. 
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Measurements on vector data 

The primitives of vector data sets are point, (poly)line and polygon. Related 
geometric measurements are location, length, distance and area size. Some of 
these are geometric properties of a feature in isolation (location, length, area 
size); others (distance) require two features to be identified. 

The location property of a vector feature is always stored by the GIS: a single 
coordinate pair for a point, or a list of pairs for a polyline or polygon boundary. 
Occasionally, there is a need to obtain the location of the centroid of a polygon; 
some GISs store these also, others compute them 'on-the-fly'. 

Length is a geometric property associated with polylines, by themselves, or in 
their function as polygon boundary. It can obviously be computed by the GIS— 
as the sum of lengths of the constituent line segments—^but it quite often is also 
stored with the polyline. 

Area size is associated with polygon features. Again, it can be computed, but 
usually is stored with the polygon as an extra attribute value. This speeds up 
the computation of other functions that require area size values. 

The attentive reader will have noted that all of the above 'measurements' do not 
actually require computation, but only retrieval of stored data. 

Measuring distance between two features is another important function. If both 
features are points, say p and q, the computation in a Gartesian spatial reference 
system are given by the well-known Pythagorean distance function: 


dist{p, q) 



XqY + {Vp 


VqY- 
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If one of the features is not a point, or both are not, we must be precise in defin¬ 
ing what we mean by their distance. All these cases can be summarized as com¬ 
putation of the minimal distance between a location occupied by the first and a 
location occupied by the second feature. This means that features that intersect 
or meet, or when one contains the other have a distance of 0. We leave a further 
case analysis, including polylines and polygons, to the reader as an exercise. It 
is not possible to store all distance values for all possible combinations of two 
features in any reasonably sized spatial database. As a result, the system must 
compute 'on the fly' whenever a distance computation request is made. 

Another geometric measurement used by the GIS is the minimal bounding box 
computation. It applies to polylines and polygons, and determines the minimal 
rectangle—^with sides parallel to the axes of the spatial reference system—that 

covers the feature. This is illustrated in Figure 6.1. Bounding box computation Minimal bounding box 

is an important support function for the GIS: for instance, if the bounding boxes 

of two polygons do not overlap, we know the polygons caimot possibly intersect 

each other. Since polygon intersection is a complicated function, but bounding 

box computation is not, the GIS will always first apply the latter as a test to see 

whether it must do the first. 




Figure 6.1: The minimal 
bounding box of (a) a poly¬ 
line, and (b) a polygon 
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For practical purposes, it is important to be aware of the measurement unit that 
applies to the spatial data layer that one is working on. This is determined by 
the spatial reference system that has been defined for it during data preparation. 

A common use of area size measurements is when one wants to sum up the 
area sizes of all polygons belonging to some class. This class could be crop type: 

What is the size of the area covered by potatoes? If our crop classification is in a 
stored data layer, the computation would include (a) selecting the potato areas, 

and (b) summing up their (stored) area sizes. Clearly, little geometric computa- Geometric computations 
tion is required in the case of stored features. This is not the case when we are 
interactively defining our vector features in CIS use, and we want measurements 
to be performed on these interactively defined features. Then, the CIS will have 
to perform complicated geometric computations. 
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Measurements on raster data 

Measurements on raster data layers are simpler because of the regularity of the 
cells. The area size of a cell is constant, and is determined by the cell resolution. 
Horizontal and vertical resolution may differ, but typically do not. Together with 
the location of a so-called anchor point, this is the only geometric information 
stored with the raster data, so all other measurements by the GIS are computed. 
The anchor point is fixed by convention to be the lower left (or sometimes upper 
left) location of the raster. 

Location of an individual cell derives from the raster's anchor point, the cell reso¬ 
lution, and the position of the cell in the raster. Again, there are two conventions: 
the cell's location can be its lower left corner, or the cell's midpoint. These con¬ 
ventions are set by the software in use, and in case of low resolution data they 
become more important to be aware of. 

The area size of a selected part of the raster (a group of cells) is calculated as the 
number of cells multiplied by the cell area size. 

The distance between two raster cells is the standard distance function applied 
to the locations of their respective mid-points, obviously taking into account 
the cell resolution. Where a raster is used to represent line features as strings 
of cells through the raster, the length of a line feature is computed as the sum 
of distances between consecutive cells. This computation is prone to error, as 
already discovered in Chapter 2 (Question 11). 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















6 . 2 . Retrieval, classification and measurement 


355 


6.2.2 Spatial selection queries 

When exploring a spatial data set, the first thing one usually wants is to select 
certain features, to (temporarily) restrict the exploration. Such selections can be 
made on geometric/spatial grounds, or on the basis of attribute data associated 
with the spatial features. We discuss both techniques below. 
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Interactive spatial selection 

In interactive spatial selection, one defines the selection condition by pointing at 
or drawing spatial objects on the screen display, after having indicated the spa¬ 
tial data layer(s) from which to select features. The interactively defined objects 

are called the selection objects; they can be points, lines, or polygons. The GIS Selection objects 

then selects the features in the indicated data layer(s) that overlap (i.e. intersect, 
meet, contain, or are contained in; see Figure 2.15) with the selection objects. 

These become the selected objects. 

As we have seen in Section 3.5, spatial data stored in a geodatabase is associated 
with its attribute data through a key/foreign key link. Selections of features lead 
to selections on the records. Vice versa, selection of records may lead to selection 
of features. 

Interactive spatial selection answers questions like "What is at ... ?" In Fig¬ 
ure 6.2, the selection object is a circle and the selected objects are the red poly¬ 
gons; they overlap with the selection object. 
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Figure 6.2: All city wards 
that overlap with the 
selection object—here a 
circle—are selected (left), 
and their corresponding 
attribute records are high¬ 
lighted (right, only part of 
the table is shown). Data 
from an urban application 
in Dar es Salaam, Tanza¬ 
nia. Data source: Dept, of 
Urban & Regional Plan¬ 
ning and Geo-information 
Management, ITC. 
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Spatial selection by attribute conditions 


It is also possible to select features by using selection conditions on feature at¬ 
tributes. These conditions are formulated in SQL if the attribute data reside in 
a geodatabase. This type of selection answers questions like "where are the fea¬ 
tures with ...7" 




Figure 6.3: Spatial se¬ 
lection using the attribute 
condition Area < 400000 
on land use areas in Dar 
es Salaam. Spatial fea¬ 
tures on left, associated 
attribute data (in part) on 
right. Data source: Dept, 
of Urban & Regional Plan¬ 
ning and Geo-information 
Management, ITC. 


Figure 6.3 shows an example of selection by affribufe condition. The query ex¬ 
pression is Area < 400000, which can be inferprefed as "selecf all the land use 
areas of which the size is less than 400,000." The polygons in red are the selected 
areas; their associated records are also highlighted in red. We can this selected 
set of features as the basis of further selection. For instance, if we are inferesfed 
in land use areas of size less than 400,000 that are of land use fype 80, the se- 
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lected features of Figure 6.3 are subjected to a further condition, LandUse = 80. 
The result is illustrated in Figure 6.4. Such combinations of conditions are fairly 
common in practice, so we devote a small paragraph on the theory of combining 
conditions. 



Area 


IDs LandUse 


174308.70 

2 

30 

2066475.00 

3 

70 

214582.50 

4 

80 

29313.86 

5 

80 

73328.08 

6 

80 

53303.30 

7 

80 

614530.10 

8 

20 

1637161.00 

9 

80 

156357.40 

10 

70 

59202.20 

11 

20 

83289.59 

12 

80 

225642.20 

13 

20 

28377.33 

14 

40 

228930.30 

15 

30 

986242.30 

16 

70 


Figure 6.4: Further spa¬ 
tial selection from the 
already selected fea¬ 
tures of Figure 6.3 using 
the additional condition 
LandUse = 80 on land use 
areas. Observe that fewer 
features are now selected. 
Data source: Dept, of 
Urban & Regional Plan¬ 
ning and Geo-information 
Management, ITC. 
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Combining attribute conditions 

When multiple criteria have to be used for selection, we need to carefully express 
all of these in a single composite condition. The tools for this come from a field 
of mathematical logic, known as propositional calculus. 

Above, we have seen simple, atomic conditions such as Area < 400000, and LandUse = 

80. Atomic conditions use a predicate symbol, such as < (less than) or = (equals). 

Other possibilities are <= (less than or equal), > (greater than), >= (greater than 
or equal) and <> (does not equal). Any of these symbols is combined with an 

expression on the left and one on the right. For instance, LandUse <>80 can be Atomic and composite 

used to select all areas with a land use class different from 80. Expressions are conditions 

either constants like 400000 and 80, attribute names like Area and LandUse, or 
possibly composite arithmetic expressions like 0.15 x Area, which would com¬ 
pute 15% of the area size. 

Atomic conditions can be combined into composite conditions using logical connec¬ 
tives. The most important ones are AND, OR, NOT and the bracket pair (■ ■ ■). If 
we write a composite condition like 

Area < 400000 AND LandUse = 80, 

we can use it to select areas for which both atomic conditions hold true. This is Logical connectives 

the meaning of the AND connective. If we had written 

Area < 400000 OR LandUse = 80 

instead, the condition would have selected areas for which either condition holds. 
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so effectively those with an area size less than 400,000, but also those with land 
use class 80. (Included, of course, will be areas for which both conditions hold.) 

The NOT connective can be used to negate a condition. For instance, the condi¬ 
tion NOT {LandUse = 80) would select all areas with a different land use class 
than 80. (Clearly, the same selection can be obtained by writing LandUse <> 80, 
but this is not the point.) Finally, brackets can be applied to force grouping 
amongsf atomic parts of a composife condition. For instance, the condition 

{Area < 30000 AND LandUse = 70) OR {Area < 400000 AND LandUse = 80) 

will select areas of class 70 less fhan 30,000 in size, as well as class 80 areas less 
than 400, 000 in size. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















6 . 2 . Retrieval, classification and measurement 


362 


Spatial selection using topological relationships 

Various forms of topological relationship between spatial objects were discussed 
in Section 2.3.4. These relationships can be useful to select features as well. The 
steps carried out are: 

1. To select one or more features as the selection objects, and 

2. To apply a chosen spatial relationship function to determine the selected 
features that have that relationship with the selection objects. 


Selecting features that are inside selection objects This type of query uses the 

containment relationship between spatial objects. Obviously, polygons can contain Point-in-polygon query 

polygons, lines or points, and lines can contain lines or points, but no other 
containment relationships are possible. 

Figure 6.5 illustrates a containment query. Here, we are interested in finding the 
location of medical clinics in the area of llala District. We first selected all areas of 
llala District, using the technique of selection by attribute condition District = 
llala". Then, these selected areas were used as selection objects to determine 
which medical clinics (as point objects) were within them. 


Selecting features that intersect The intersect operator identifies features that 
are not disjoint in the sense of Figure 2.15, but now extended to include points 
and lines. Figure 6.6 provides an example of spatial selection using the intersect 
relationship between lines and polygons. We selected all roads intersecting llala 
District. 
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Figure 6.5: Spatial se¬ 
lection using containment. 
In dark green, all wards 
within llala District as the 
selection objects. In red, 
all medical clinics located 
inside these areas, and 
thus inside the district. 
Data source: Dept, of Ur¬ 
ban & Regional Planning 
and Geo-information Man¬ 
agement, ITC. 


Selecting features adjacent to selection objects Adjacency is the meet relation¬ 
ship of Section 2.3.4. It expresses that features share boundaries, and therefore 
it applies only to line and polygon features. Figure 6.7 illustrates a spatial adja¬ 
cency query. We want to select all parcels adjacent to an industrial area. The first 
step is to select that area (in dark green) and then apply the adjacency function 
to select all land use areas (in red) that are adjacent to it. 


Selecting features based on their distance One may also want to use the dis¬ 
tance function of the GIS as a tool in selecting features. Such selections can be 
searches within a given distance from the selection objects, at a given distance, or 
even beyond a given distance. There is a whole range of applications to this type 
of selection, e.g.: 
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Figure 6.6: Spatial se¬ 
lection using intersection. 
The wards of llala District 
function as the selection 
objects (in dark green), 
and all roads (partially) in 
the district are selected (in 
red). Data source: Dept, 
of Urban & Regional Plan¬ 
ning and Geo-information 
Management, ITC. 


• Which clinics are within 2 kilometres of a selected school? (Information 
needed for the school emergency plan.) 

• Which roads are within 200 metres of a medical clinic? (These roads musf 
have a high road mainfenance priorify.) 

Figure 6.8 illustrates a spatial selection using distance. Here, we executed the 
selection of the second example above. Our selection objects were all clinics, 
and we selected the roads that pass by a clinic within 200 metres. 

In situations in which we know the distance criteria to use—for selections within, 
at or beyond that distance value—the GIS has many (straightforward) compu¬ 
tations to perform. Things become more complicafed if our disfance selection 
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condition involves the word 'nearest' or 'farthest'. The reason is that not only 
must the GIS compute distances from a selection object A to all potentially se¬ 
lectable features F, but also it must find that feature F that is nearest to (resp., 
farthest away from) object A. So, this requires an extra computational step to 
determine minimum (maximum) values. Most GIS packages support this type 
of selecfion, though the mechanics ('the buttons to use') differ. 


Complex proximity 
formulations 


Afterthought on selecting features So far we have discussed a number of dif¬ 
ferent techniques for selecting features. We have also seen that selection condi¬ 
tions on attribute values can be combined using logical coimectives like AND, 
OR and NOT. A fact is that the other techniques of selecting features can usu- 



Figure 6.7: Spatial selec¬ 
tion using adjacency. Our 
selection object is an in¬ 
dustrial area near down 
town Dar es Salaam, Tan¬ 
zania; our adjacency se¬ 
lection finds all adjacent 
land use areas. Data 
source: Dept, of Urban 
& Regional Planning and 
Geo-information Manage¬ 
ment, ITC. 
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Figure 6.8: Spatial se¬ 
lection using the distance 
function. With all clin¬ 
ics being our selection 
objects, we searched for 
roads that pass by within 
200 metres. Observe that 
this also selects road seg¬ 
ments that are far away 
from any clinic, simply be¬ 
cause they belong to a 
road of which a segment 
is nearby. Data source: 
Dept, of Urban & Re¬ 
gional Planning and Geo¬ 
information Management, 
ITC. 
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ally also be combined. Any set of selected features can be used as the input for 
a subsequent selection procedure. This means, for instance, that we can select 
all medical clinics first, then identify the roads within 200 metres, then select 
from them only the major roads, then select the nearest clinics to these remain¬ 
ing roads, as the ones that should receive our financial support. In this way, we 
are combining various techniques of selection. 
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6.2.3 Classification 

Classification is a technique of purposefully removing detail from an input data 
set, in the hope of revealing imporfanf pafterns (of spatial distribution). In the 
process, we produce an output data set, so that the input set can be left intact. 

We do so by assigning a characteristic value to each element in the input set, 
which is usually a collection of spatial features that can be raster cells or points, 
lines or polygons. If the number of characferistic values is small in comparison 
to the size of the input set, we have classified the input set. 

The pattern that we look for may be the distribution of household income in a 
cify Household income is called the classification parameter. If we know for 
each ward in the city the associated average income, we have many different 

values. Subsequently, we could define five differenf cafegories (or: classes) of Classification parameter 
income: Tow', 'below average', 'average', 'above average' and 'high', and pro¬ 
vide value ranges for each category. If these five categories are mapped in a 
sensible colour scheme, this may reveal interesting information. This has been 
done for Dar es Salaam in Figure 6.9 in fwo ways. 

The inpuf dafa sef may have ifself been the result of a classificafion, and in such a 
case we call it a reclassification. For example, we may have a soil map that shows 
different soil type units and we would like to show the suitability of unifs for 

a specific crop. In this case, it is better to assign to the soil units an attribute Reclassification 

of suifabilify for the crop. Since different soil types may have the same crop 
suitability, a classification may merge soil units of different type into the same 
category of crop suifabilify. 

In classificafion of vecfor dafa, there are two possible results. In the first, the 
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(b) 


Figure 6.9: Two classifi¬ 
cations of average annual 
household income per 
ward in Dar es Salaam, 
Tanzania. Higher income 
areas in darker greens. 
Five categories were 
identified, (a) with orig¬ 
inal polygons left intact; 
(b) with original polygons 
merged when in same 
category. The data used 
for this illustration are not 
factual. 


input features may become the output features in a new data layer, with an ad¬ 
ditional category assigned. In other words, nothing changes with respect to the 
spatial extents of the original features. Figure 6.9(a) is an illustration of this first 
type of output. A second type of output is obtained when adjacent features with 

the same category are merged into one bigger feature. Such post-processing Aggregation and merging 

functions are called spatial merging, aggregation or dissolving. An illustration of 

this second type is found in Figure 6.9(b). Observe that this type of merging is 

only an option in vector data, as merging cells in an output raster on the basis 

of a classification makes little sense. Vector data classification can be performed 

on point sets, line sets or polygon sets; the optional merge phase is sensible 
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only for lines and polygons. Below, we discuss two kinds of classification: user- 
controlled and automatic. 


Household income range 

New category value 

391-2474 

1 

2475-6030 

2 

6031-8164 

3 

8165-11587 

4 

11588-21036 

5 


Table 6.1 : Classification 
table used in Figure 6.9. 
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User-controlled classification 

In user-controlled classification, a user selects the attribute(s) that will be used as 
the classification parameter(s) and defines the classification method. The latter 
involves declaring the number of classes as well as the correspondence between 
the old attribute values and the new classes. This is usually done via a classifi¬ 
cation table. The classification table used for Figure 6.9 is displayed in Table 6.1. 
It is rather typical for cases in which the used parameter domain is continuous 
(as in household income). Then, the table indicates value ranges to be mapped to 
the same category. Observe that categorical values are ordinal data, in the sense 
of Section 2.2.3. 

Another case exists when the classification parameter is nominal or at least dis¬ 
crete. Such an example is given in Figure 6.10. 

We must also define the data format of the output, as a spatial data layer, which 
will contain the new classification attribute. The data type of this attribute is 
always categorical, i.e. integer or string, no matter what is the data type of the 
attribute(s) from which the classification was obtained. 

Sometimes, one may want to perform classification only on a selection of fea¬ 
tures. In such cases, there are two options for the features that are not selected. 
One option is to keep their original values, while the other is to assign a null 
value to them in the output data set. A null value is a special value that means 
that no applicable value is present. Care must be taken to deal with these values 
correctly, both in computation and in visualization. 
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Code 

Old category 

New category 

10 

Planned resi¬ 
dential 

Residential 

20 

Industry 

Commercial 

30 

Commercial 

Commercial 

40 

Institutional 

Public 

50 

Transport 

Public 

60 

Recreational 

Public 

70 

Non built-up 

Non built-up 

80 

Unplanned 

residential 

Residential 



Figure 6.10: An example 
of a classification on a dis¬ 
crete parameter, namely 
land use unit in the city 
of Dar es Salaam, Tan¬ 
zania. Colour scheme: 
Residential (brown). Com¬ 
mercial (yellow). Public 
(Olive), Non built-up (or¬ 
ange). Data source: Dept, 
of Urban & Regional Plan¬ 
ning and Geo-information 
Management, ITC. 
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Automatic classification 

User-controlled classifications require a classification table or user interaction. 
GIS software can also perform automatic classification, in which a user only 
specifies the number of classes in the output data set. The system automati¬ 
cally determines the class break points. Two main techniques of determining 
break points are in use. 

1. Equal interval technique: The minimum and maximum values Vmin arid Vmax 
of the classification parameter are determined and the (constant) interval 
size for each category is calculated as (vmax — Vmin)/n, where n is the num¬ 
ber of classes chosen by the user. This classification is useful in revealing 
the distribution patterns as it determines the number of features in each 
category. 

2. Equal frequency technique: This technique is also known as quantile classifi¬ 
cation. The objective is to create categories with roughly equal numbers of 
features per category. The total number of features is determined first and 
by the required number of categories, the number of features per category 
is calculated. The class break points are then determined by counting off 
the features in order of classification parameter value. 


Both techniques are illustrated on a small 5x5 raster in Figure 6.11. 


When to use which? Which of these techniques should be applied to a given 
dataset depends upon the purpose of the analysis (what the user is trying to 
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achieve) as well as the characteristics of the data itself. The reader is encouraged 
to experiment with their data and compare the results given by each method. 
Other (and possibly better) techniques exist. 

While these two types of classification can be used in spatial analysis, they are 
also frequently used to develop visualizations of the same phenomena. In terms 
of analytical operations we refer to some kind of calculation or function which 
will use these categories. In terms of visualization, we refer to the graphical 
representation of the data using these classifications. Just as either technique 
yields different results in numeric terms, it will do the same in visual terms. 
Please refer to Chapter 7 for more discussion on issues relating to mapping and 
visualization. 
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(a) original raster 


1 

1 

1 

1 

4 

2 

2 

3 

2 

5 

2 

2 

2 

1 

5 

2 

3 

3 

4 

4 

2 

1 

1 

1 

1 


(b) equal interval 
classification 


original 

vaiue 

new 

value 

# ceils 

1,2 

1 

9 

3,4 

2 

8 

5,6 

3 

3 

7,8 

4 

3 

9,10 

5 
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Figure 6.11; Example of 
two automatic classifica¬ 
tion techniques: (a) the 
original raster with cell 
values; (b) classification 
based on equal intervals; 
(c) classification based on 
equal frequencies. Be¬ 
low, the respective classi¬ 
fication tables, with a tally 
of the number of cells in¬ 
volved. 
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6.3 Overlay functions 


In the previous section, we saw various techniques of measuring and selecting 
spatial data. We also discussed the generation of a new spatial dafa layer from an 
old one, using classification. In this section, we look at techniques of combining 
two spatial data layers and producing a third from them. The binary operators 
that we discuss are known as spatial overlay operators. We will firstly discuss 
vector overlay operators, and then focus on the raster case. 

Standard overlay operators take two input data layers, and assume they are geo- 
referenced in the same system, and overlap in study area. If either of these re¬ 
quirements is not met, the use of an overlay operator is senseless. The principle Overlay requirements 

of spatial overlay is to compare the characteristics of the same location in both 
data layers, and to produce a result for each location in the output data layer. 

The specific resulf fo produce is defermined by the user. It might involve a cal¬ 
culation, or some other logical function to be applied to every area or location. 

In raster data, as we shall see, these comparisons are carried out between pairs 
of cells, one from each inpuf rasfer. In vecfor dafa, the same principle of com¬ 
paring locations applies, buf the underlying computations rely on determining 
the spatial intersections of feafures from each input layer. 
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6.3.1 Vector overlay operators 

In the vector domain, overlay is computationally more demanding than in the 
raster domain. Here we will only discuss overlays from polygon data layers, but 
we note that most of the ideas also apply to overlay operations with point or line 
data layers. 


vector data layer A vector data layer B 



Figure 6.12; The polygon 
intersect (overlay) opera¬ 
tor. Two polygon layers 
A and B produce a new 
polygon layer (with asso¬ 
ciated attribute table) that 
contains all intersections 
of polygons from A and B. 
Figure after [8]. 


The standard overlay operator for two layers of polygons is the polygon intersec¬ 
tion operator. It is fundamental, as many other overlay operators proposed in 
the literature or implemented in systems can be defined in terms of it. The prin¬ 
ciples are illustrated in Figure 6.12. The result of this operator is the collection of 

all possible polygon intersections; the attribute table result is a join—in the rela- Spatial join 

tional database sense of Chapter 3 —of the two input attribute tables. This output 
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Figure 6.13: The residen¬ 
tial areas of llala District, 
obtained from polygon in¬ 
tersection. Input for the 
polygon intersection oper¬ 
ator were (a) a polygon 
layer with all llala wards, 
(b) a polygon layer with 
the residential areas, as 
classified in Figure 6.10. 
Data source: Dept, of Ur¬ 
ban & Regional Planning 
and Geo-information Man¬ 
agement, ITC. 


attribute table only contains one tuple for each intersection polygon found, and 
this explains why we call this operator a spatial join. 

A more practical example is provided in Figure 6.13, which was produced by 
polygon intersection of the ward polygons with land use polygons classified as 
in Figure 6.10. This has allowed us fo select the residential areas in llala District. 

Two more polygon overlay operators are illustrated in Figure 6.14. The first is 
known as the polygon clipping operator. It takes a polygon data layer and restricts 

its spatial extent to the generalized outer boundary obtained from all (selected) Polygon clipping 

polygons in a second input layer. Besides this generalized outer boundary, no 
other polygon boundaries from the second layer play a role in the result. 

A second overlay operator is polygon overwrite. The result of this binary operator 
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is defined is a polygon layer with the polygons of the first layer, except where 
polygons existed in the second layer, as these take priority The principle is 
illustrated in the lower half of Figure 6.14. Most GISs do not force the user to 
apply overlay operators to the full polygon data set. One is allowed to first select 
relevant polygons in the data layer, and then use the selected set of polygons as 
an operator argument. 

The fundamental operator of all these is polygon intersection. The others can be 
defined in terms of it, usually in combination with polygon selection and/or 

classification. For instance, the polygon overwrite of A by 5 can be defined as Polygon intersection 

polygon intersection between A and B, followed by a (well-chosen) classification 
that prioritizes polygons in B, followed by a merge. The reader is asked to verify 
this. 

Vector overlays are usually also defined for point or line data layers. Their defi¬ 
nition parallels the definitions of operators discussed above. Different GISs use 
different names for these operators, and one is advised to carefully check the 
documentation before applying any of these operators. 
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Figure 6.14: Two more 
polygon overlay operators: 

(a) polygon clip overlay 
clips down the left hand 
polygon layer to the gener¬ 
alized spatial extent of the 
right hand polygon layer; 

(b) polygon overwrite over¬ 
lay overwrites the left hand 
polygon layer with the 
polygons of the right hand 
layer. 
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6.3.2 Raster overlay operators 

Vector overlay operators are useful, but geometrically complicated, and this 
sometimes results in poor operator performance. Raster overlays do not suf¬ 
fer from this disadvantage, as most of them perform their computations cell by 
cell, and thus they are fast. 

GISs that support raster processing—as most do—usually have a language to 
express operations on rasters. These languages are generally referred fo as map 
algebra [54], or sometimes raster calculus. They allow a GIS to compute new 
rasters from existing ones, using a range of functions and operators. Unfor¬ 
tunately, not all implementations of map algebra offer the same functionality. 
The discussion below is to a large extent based on general terminology, and 
attempts to illustrate the key operations using a logical, structured language. 
Again, the syntax often differs for differenf GIS software packages. 

When producing a new raster we must provide a name for it, and define how if 
is compufed. This is done in an assignmenf sfatemenf of the following formal: 


Output ^raster mame := Map^algebra^expression. 


The expression on the right is evaluated by the GIS, and the raster in which it 
results is then stored under the name on the left. The expression may contain 
references fo existing rasters, operators and functions; the format is made clear 
below. The raster names and constants that are used in the expression are called 
its operands. When the expression is evaluated, the GIS will perform the calcu¬ 
lation on a pixel by pixel basis, starting from the first pixel in the first row, and 
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continuing until the last pixel in the last row. There is a wide range of operators 
and functions that can be used in map algebra, which we discuss below. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















6.3. Overlay functions 


383 


Arithmetic operators 

Various arithmetic operators are supported. The standard ones are multiplica¬ 
tion (x), division (/), subtraction (—) and addition (-I-). Obviously, these arith¬ 
metic operators should only be used on appropriate data values, and for in¬ 
stance, not on classification values. 

Other arithmetic operators may include modulo division (MOD) and integer di¬ 
vision (DIV). Modulo division returns the remainder of division: for instance, 
10 MOD 3 will return 1 as 10 — 3x3 = 1. Similarly, 10 DIV 3 will return 3. 
More operators are goniometric: sine (sin), cosine (cos), tangent (tan), and their 
inverse functions asin, acos, and atari, which return radian angles as real values. 
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Figure 6.15: Examples of 
arithmetic map algebra ex¬ 
pressions 


Some simple map algebra assignments are illustrated in Figure 6.15. The assign- 
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merit: 


Cl :=A + W 


will add a constant factor of 10 to all cell values of raster A and store the result 
as output raster Cl. The assignment: 


C2:= A + B 


will add the values of A and B cell by cell, and store the result as raster C2. 
Finally, the assignment 

C3 := {A- B)/{A + B) X 100 


will create output raster C3, as the result of the subtraction (cell by cell, as usual) 
of B cell values from A cell values, divided by their sum. The result is multi¬ 
plied by 100. This expression, when carried out on AVHRR charmel 1 (red) and 
AVHRR channel 2 (near infrared) of NOAA satellite imagery, is known as the 
NDVI {Normalized Difference Vegetation Index). It has proven to be a good indica¬ 
tor of the presence of green vegetation. 
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Comparison and logical operators 

Map algebra also allows the comparison of rasters cell by cell. To this end, we 
may use the standard comparison operators (<, <=, =, >=, > and <>) that we 
introduced before. 

A simple raster comparison assignment is: 

C:=A<>B. 

It will store truth values—either true or false —in the output raster C. A cell 
value in C will be true if the cell's value in A differs from that cell's value in B. 

It will be f al se if they are the same. 

Logical coimectives are also supported in most implementations of map algebra. 

We have already seen the coimectives of AND, OR and NOT in Section 6.2.2. An¬ 
other cormective that is commonly offered in map algebra is exclusive OR (XOR). 

The expression a XOR b is true only if either a or 6 is true, but not both. Examples Comparison operators and 

of the use of these comparison operators and connectives are provided in Fig- connectives 

ure 6.16 and Figure 6.17. The latter figure provides various raster computations 

in search of forests at specific elevations. In the figure, raster D1 indicates forest 

below 500 m, D2 indicates areas below 500 m that are forests, raster D3 areas 

that are either forest or below 500 m (but not at the same time), and raster DA 

indicates forests above 500 m. 
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Figure 6.16: Examples of 
logical expressions in map 
algebra. Green cells rep¬ 
resent true values, white 
cells represent false val¬ 
ues. 
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F F 


F = forest 

7 = 700 m. 

6 = 600 m. 

4 = 400 m. B 
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4 
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D1 


D1 := (A = "forest") AND (B < 500) 


D2 


D2 := (A = "forest") OR (B < 500) 


D3 


D3 := (A = "forest") XOR (B < 500) 


D4 


D4 := (A = "forest") AND NOT (B < 500) 


Figure 6.17; Examples of 
complex logical expres¬ 
sions in map algebra. A is 
a classified raster for land 
use, and B holds elevation 
values. 
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Conditional expressions 

The above comparison and logical operators produce rasters with the truth val¬ 
ues true and false. In practice, we often need a conditional expression with 
them that allows us to test whether a condition is fulfilled. The general format 


Output ^raster := CON {condition, then ^expression, else ^expression). 


Here, condition is the tested condition, then.expression is evaluated if condition 
holds, and else.expression is evaluated if it does not hold. 

This means that an expression like CON {A = '^forest”, 10,0) will evaluate to 10 
for each cell in the output raster where the same cell in A is classified as for¬ 
est. In each cell where this is not true, the else.expression is evaluated, resulting 
in 0. Another example is provided in Figure 6.18, showing that values for the 
then.expression and the else.expression can be some integer (possibly derived 
from another calculation) or values derived from other rasters. In this example, 
the output raster Cl is assigned the values of input raster B wherever the cells 
of input raster A contain forest. The cells in output raster C2 are assigned 10 
wherever the elevation (B) is equal to 7 and the groundcover (A) is forest. 


^ We have already noted that specific software packages may differ in fhe specifics of fhe 
synfax fhaf make up an expression. This exfends fo fhe acfual commands— some packages 
using "IFF" insfead of "CON". 
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Figure 6.18: Examples of 
conditional expressions in 
map algebra. Here A is 
a classified raster holding 
land use data, and B is an 
elevation value raster. 
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6.3.3 Overlays using a decision table 

Conditional expressions are powerful tools in cases where multiple criteria must 
be taken into account. A small size example may illustrate this. Consider a 
suitability study in which a land use classification and a geological classification 

must be used. The respective rasters are illustrated in Figure 6.19 on the left. Do- Domain expertise 

main expertise dictates that some combinations of land use and geology result 
in suitable areas, whereas other combinations do not. In our example, forests 
on alluvial terrain and grassland on shale are considered suitable combinations, 
while the others are not. 

We could produce the output raster of Figure 6.19 with a map algebra expression 
such as: 


Suitability := CON{{Landuse = ‘^Forest” AND Geology = ‘^Alluvial”) OR 

{Landuse = ^'Grass'''’ AND Geology = Shale")■, 
“Suitable”, “Unsuitable”) 


and consider ourselves lucky that there are only two 'suitable' cases. In practice, 
many more cases must usually be covered, and then writing up a complex CON 
expression is not an easy task. 

To this end, some GISs accommodate setting up a separate decision table that 
will guide the raster overlay process. This extra table carries domain expertise. 
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Land use raster 



Decision tabie 


o 

CO 

3 

TD 

C 

CD 



Alluvial □ 

Shale □ 

Forest □ 

Suitable □ 

Unsuitable ■ 

Grass □ 

Unsuitable ■ 

Suitable □ 

Lake □ 

Unsuitable ■ 

Unsuitable ■ 


Suitability 



Figure 6.19: The use of 
a decision table in raster 
overlay. The overlay is 
computed in a suitability 
study, in which land use 
and geology are impor¬ 
tant factors. The mean¬ 
ing of values in both input 
rasters, as well as the out¬ 
put raster can be under¬ 
stood from the decision ta¬ 
ble. 


and dictates which combinations of input raster cell values will produce which 
output raster cell value. This gives us a raster overlay operator using a decision 
table, as illustrated in Figure 6.19. The GIS will have supporting functions to 
generate the additional table from the input rasters, and to enter appropriate 
values in the table. 
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6.4 Neighbourhood functions 


In our section on overlay operators, the guiding principle was to compare or 
combine the characteristic value of a location from two data layers, and to do 
so for all locations. This is what map algebra, for instance, gave us: cell by cell 
calculations, with the results stored in a new raster. 

There is another guiding principle in spatial analysis that can be equally useful. 
The principle here is to find out the characteristics of the vicinity, here called 
neighbourhood, of a location. After all, many suitability questions, for instance, 
depend not only on what is at the location, but also on what is near the location. 
Thus, the GIS must allow us To look around locally'. 

To perform neighbourhood analysis, we must: 

1. State which target locations are of interest to us, and define their spatial 
extent, 

2. Define how to determine the neighbourhood for each target, 

3. Define which characteristic(s) must be computed for each neighbourhood. 

For instance, our target might be a medical clinic. Its neighbourhood could be 
defined as: 


• An area within 2 km distance as the crow flies, or 

• An area within 2 km travel distance, or 
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• All roads within 500 m travel distance, or 

• All other clinics within 10 minutes travel time, or 

• All residential areas, for which the clinic is the closest clinic. 


The alert reader will note the increasingly complex definitions of 'neighbour¬ 
hood' used here. This is to illustrate that different ways of measuring neighbour¬ 
hoods exist, and some are better (or more representative of real neighbourhoods) 
than others, depending on the purpose of the analysis. 

Then, in the third step we indicate what it is we want to discover about the 
phenomena that exist or occur in the neighbourhood. This might simply be its 
spatial extent, but it might also be statistical information like: 

• The total population of the area, 

• Average household income, or 

• The distribution of high-risk industries located in the neighbourhood. 


The above are typical questions in an urban setting. When our interest is more in 
natural phenomena, different examples of locations, neighbourhoods and neigh¬ 
bourhood characteristics arise. Since raster data are the more commonly used in 
this case, neighbourhood characteristics often are obtained via statistical sum¬ 
mary functions that compute values such as average, minimum, maximum, and 
standard deviation of the cells in the identified neighbourhood. 
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Determining neighbourhood extent To select target locations, one can use the 
selection techniques that we discussed in Section 6.2.2. To obtain characteristics 
from an eventually identified neighbourhood, the same techniques apply. So 
what remains to be discussed here is the proper determination of a neighbour¬ 
hood. 

One way of determining a neighbourhood around a target location is by making 
use of the geometric distance function. We discuss some of these techniques in 

Section 6.4.1. Geometric distance does not take into account direction and certain Proximity function 

phenomena can only be studied by doing so. For example, pollution spread by 
rivers, ground water flow, or prevailing weather systems. 

The more advanced techniques for computation of flow and diffusion are dis¬ 
cussed in Section 6.4.2. Diffusion functions are based on the assumption that 
the phenomenon spreads in all directions, though not necessarily equally eas¬ 
ily in all directions. Hence, it uses local terrain characteristics to compute the Complex neighbourhoods 
local resistance against diffusion. In flow computations, the assumption is that 
the phenomenon will choose a least-resistance path, and not spread in all direc¬ 
tions. This, as we will see, involves the computation of preferred local direction 
of spread. Both flow and diffusion computations take local characteristics into 
account, and are therefore more easily performed on raster data. 
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6.4.1 Proximity computations 

In proximity computations, we use geometric distance to define the neighbour¬ 
hood of one or more targef locations. The most common and useful fechnique 
is buffer zone generation. Another technique based on geometric distance that we 
discuss is Thiessen polygon generation. 
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Buffer zone generation 


The principle of buffer zone generation is simple: we select one or more target 
locations, and then determine the area around them, within a certain distance. 
In Figure 6.20(a), a number of main and minor roads were selected as targets, 
and a 75 m (resp., 25 m) buffer was computed from them. In some case stud¬ 
ies, zonated buffers must be determined, for instance in assessments of traffic 
noise effects. Most GISs support this type of zonated buffer computation. An 
illustration is provided in Figure 6.20(b). 



In vector-based buffer generation, the buffers themselves become polygon fea¬ 
tures, usually in a separate data layer, that can be used in further spatial analysis. 


Figure 6.20: Buffer zone 
generation: (a) around 

main and minor roads. Dif¬ 
ferent distances were ap¬ 
plied: 25 metres for minor 
roads, 75 metres for main 
roads, (b) Zonated buffer 
zones around main roads. 
Three different zones were 
obtained: at 100 metres 
from main road, at 200, 
and at 300 metres. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 























6.4. Neighbourhood functions 


397 


Buffer generation on rasters is a fairly simple function. The target location or lo¬ 
cations are always represented by a selection of the raster's cells, and geometric 
distance is defined, using cell resolution as the unit. The distance function ap¬ 
plied is the Pythagorean distance between the cell centres. The distance from a 
non-target cell to the target is the minimal distance one can find between that 
non-target cell and any target cell. 
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Thiessen polygon generation 

Thiessen polygon partitions make use of geometric distance for determining neigh¬ 
bourhoods. This is useful if we have a spatially distributed set of points as target 
locations, and we want to know for each location in the study to which target 
it is closest. This technique will generate a polygon around each target location 
that identifies all those locations that 'belong to' that target. We have already 
seen the use of Thiessen polygons in the context of interpolation of point data, as 
discussed in Section 5.4.1. Given an input point set that will be the polygon's 
midpoints, it is not difficult to construct such a partition. It is even much easier 
to construct if we already have a Delaunay triangulation for the same input point 
set (see Section 2.3.3 on TlNs). 

Figure 6.21 repeats the Delaunay triangulation of Figure 2.9(b). The Thiessen 
polygon partition constructed from it is on the right. The construction first cre¬ 
ates the perpendiculars of all the triangle sides; observe that a perpendicular 
of a triangle side that coimect point A with point B is the divide between the 
area closer to A and the area closer to B. The perpendiculars become part of the 
boundary of each Thiessen polygon. 
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Figure 6.21; Thiessen 
polygon construction 
(right) from a Delau¬ 
nay triangulation (left): 
perpendiculars of the 
triangles form the bound¬ 
aries of the polygons. 
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6.4.2 Computation of diffusion 

The determination of neighbourhood of one or more farget locations may de¬ 
pend nof only on disfance—cases which we discussed above—^but also on di- 
recfion and differences in the terrain in different directions. This typically is the 
case when the target location contains a 'source material' that spreads over time, 

referred fo as dijfusion. This 'source maferial' may be air, wafer or soil pollution. Diffusion and spread 

commufers exiting a frain sfafion, people from an opened-up refugee camp, a 
wafer spring uphill, or the radio waves emitted from a radio relay sfafion. In 
all these cases, one will not expect the spread to occur evenly in all directions. 

There will be local terrain factors that influence fhe spread, making if easier or 
more difficult. Many GISs provide support for fhis fype of compufafion, and we 
discuss some of ifs principles here, in the context of raster data. 

Diffusion computation involves one or more target locations, which are better 
called source locations in this context. They are the locations of the source of 
whafever spreads. The compufafion also involves a local resistance raster, which 
for each cell provides a value fhaf indicafes how difficulf if is for the 'source - 

material' to pass by that cell. The value in the cell must be normalized: i.e. valid Resistance 

for a sfandardized length (usually the cell's width) of spread path. From the 
source location(s) and the local resistance raster, the GIS will be able to com¬ 
pute a new raster that indicates how much minimal total resistance the spread has 
witnessed for reaching a raster cell. This process is illustrated in Figure 6.22. 

While computing total resistances, the GIS takes proper care of the path lengths. 

Obviously, the diffusion from a cell Csrc to its neighbour cell to the east Ce is 
shorter than to the cell that is its northeast neighbour Cne- The distance ratio 
between these two cases is 1 ; \/2. If val{c) indicates the local resistance value 
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Figure 6.22: Computation 
of diffusion on a raster. 
The lower left green cell 
is the source location, in¬ 
dicated in the local re¬ 
sistance raster (a). The 
raster in (b) is the mini¬ 
mal total resistance raster 
computed by the GIS. 
(The GIS will work in 
higher precision real arith¬ 
metic than what is illus¬ 
trated here.) 


for cell c, the GIS computes the total incurred resistance for diffusion from Csrc to 
Ce as l{val{csrc) + val{ce)), while the same for Csrc to Cne is \ {val{csrc) + vcLKcne)) x 
\/2. The accumulated resistance along a path of cells is simply the sum of these 
incurred resistances from pairwise neighbour cells. 

Since 'source material' has the habit of taking the easiest route to spread, we 
must determine at what minimal cost (i.e. at what minimal resistance) it may 

have arrived in a cell. Therefore, we are interested in the minimal cost path. To Minimal cost path 

determine the minimal total resistance along a path from the source location Csrc 

to an arbitrary cell Cx, the GIS determines all possible paths from Csrc to Cx, and 

then determines which one has the lowest total resistance. This value is found, 

for each cell, in the raster of Figure 6.22(b). 

For instance, there are three paths from the green source location to its northeast 
neighbour cell (with local resistance 5). We can define them as path 1 (N-E), 
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path 2 (E-N) and path 3 (NE), using compass directions to define the path from 
the green cell. Eor path 1, the total resistance is computed as: 

t(4 + 4) + t(4 + 5) = 8.5. 

Path 2, in similar style, gives us a total value of 6.5. Eor path 3, we find 

^(4 + 5) X \/2 = 6.36, 

and thus it obviously is the minimal cost path. The reader is asked to verify one 
or two other values of minimal cost paths that the GIS has produced. 
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6.4.3 Flow computation 

Flow computations determine how a phenomenon spreads over the area, in 
principle in all directions, though with varying difficulty or resistance. There 
are also cases where a phenomenon does not spread in all directions, but moves 
or 'flows' along a given, leasf-cost pafh, determined again by local terrain char¬ 
acteristics. The typical case arises when we want to determine the drainage pat¬ 
terns in a catchment: the rainfall wafer 'chooses' a way fo leave fhe area. 

This principle is illusfrafed wifh a simple elevation raster, in Figure 6.23(a). For 
each cell in that raster, the steepest downward slope to a neighbour cell is com¬ 
puted, and its direction is stored in a new raster (Figure 6.23(b)). This compu¬ 
tation determines the elevation difference between the cell and a neighbour cell, 
and takes into account cell distance— 1 for neighbour cells in N-S or W-E direc¬ 
tion, a/2 for cells in NE-SW or NW-SE direction. Among its eight neighbour Determining flow direction 

cells, it picks the one with the steepest path to it. The directions in raster (b), 
thus obtained, are encoded in integer values, and we have 'decoded' them for 
the sake of illusfrafion. Rasfer (b) can be called the flow direction raster. From 
raster (b), the GIS can compute the accumulated flow count raster, a raster that for 
each cell indicafes how many cells have their water flow info fhe cell. 

Cells with a high accumulated flow counf represenf areas of concentrafed flow, 
and thus may belong to a stream. By using some appropriately chosen threshold 
value in a map algebra expression, we may decide whether they do. Cells with 
an accumulated flow count of zero are local topographic highs, and can be used 
to identify ridges. 
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Figure 6.23; Flow compu¬ 
tations on a raster: (a) the 
original elevation raster, 
(b) the flow direction raster 
computed from it, (c) accu¬ 
mulated flow count raster. 
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6.4.4 Raster based surface analysis 

Continuous fields have a number of characferistics nof shared by discrefe fields. 
Since the field changes confinuously, we can talk about slope angle, slope aspect 
and concavity/convexity of the slope. These notions are not applicable to discrete 
fields. 

The discussions in this section use terrain elevation as the prototypical exam¬ 
ple of a continuous field, but all issues discussed are equally applicable to other 
types of continuous fields. Nonetheless, we regularly refer to the continuous 
field representation as a DEM, to conform with the most common situation. 
Throughout the section we will assume that the DEM is represented as a raster. 
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Applications 

There are numerous examples where more advanced computations on continu¬ 
ous field representations are needed. A short list is provided below. 


• Slope angle calculation The calculation of the slope steepness, expressed as 
an angle in degrees or percentages, for any or all locations. 

• Slope aspect calculation The calculation of the aspect (or orientation) of the 
slope in degrees (between 0 and 360 degrees), for any or all locations. 

• Slope convexity/concavity calculation Slope convexity—defined as the change 
of the slope (negative when the slope is concave and positive when the 
slope is convex)—can be derived as the second derivative of the field. 

• Slope length calculation With the use of neighbourhood operations, it is pos¬ 
sible to calculate for each cell the nearest distance to a watershed boundary 
(the upslope length) and to the nearest stream (the downslope length). This 
information is useful for hydrological modelling. 

• Hillshading is used to portray relief difference and terrain morphology in 
hilly and mountainous areas. The application of a special filter to a DEM 
produces hillshading. Filters are discussed on page 6.4.4. The colour tones 
in a hillshading raster represent the amount of reflected light in each loca¬ 
tion, depending on its orientation relative to the illumination source. This 
illumination source is usually chosen at an angle of 45° above the horizon 
in the north-west. 
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• Three-dimensional map display With GIS software, three-dimensional views 
of a DEM can be constructed, in which the location of the viewer, the angle 
under which s/he is looking, the zoom angle, and the amplification fac¬ 
tor of relief exaggeration can be specified. Three-dimensional views can 
be constructed using only a predefined mesh, covering the surface, or us¬ 
ing other rasters (e.g. a hillshading raster) or images (e.g. satellite images) 
which are draped over the DEM. 

• Determination of change in elevation through time The cut-and-fill volume of 
soil to be removed or to be brought in to make a site ready for construction 
can be computed by overlaying the DEM of the site before the work begins 
with the DEM of the expected modified topography. It is also possible to 
determine landslide effects by comparing DEMs of before and after the 
landslide event. 

• Automatic catchment delineation Catchment boundaries or drainage lines 
can be automatically generated from a good quality DEM with the use 
of neighbourhood functions. The system will determine the lowest point 
in the DEM, which is considered the outlet of the catchment. Erom there, 
it will repeatedly search the neighbouring pixels with the highest altitude. 
This process is continued until the highest location (i.e. cell with highest 
value) is found, and the path followed determines the catchment bound¬ 
ary. Eor delineating the drainage network, the process is reversed. Now, 
the system will work from the watershed downwards, each time looking 
for the lowest neighbouring cells, which determines the direction of water 
flow. 

• Dynamic modelling Apart from the applications mentioned above, DEMs 
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are increasingly used in GIS-based dynamic modelling, such as the com¬ 
putation of surface run-off and erosion, groundwafer flow, the delineation 
of areas affecfed by pollution, the computation of areas fhaf will be covered 
by processes such as debris flows and lava flows. 

• Visibility analysis A viewshed is the area that can be 'seen'—i.e. is in the 
direct line-of-sight—from a specified targef location. Visibility analysis de¬ 
termines the area visible from a scenic lookouf, the area that can be reached 
by a radar anteima, or assesses how effectively a road or quarry will be 
hidden from view. 


Some of the more important computations mentioned above are further dis¬ 
cussed below. All of them apply a technique known as filtering, so we will first 
examine this principle in more detail. 
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Filtering 

The principle of filtering is quite similar to that of moving window averaging, 
which we discussed in Section 5.4.2. Again, we define a window and let the 
GIS move it over the raster cell-by-cell. For each cell, the system performs some 
computation, and assigns the result of this computation to the cell in the output 

raster.^ The difference with moving window averaging is that the moving win- Window or kernel 

dow in filtering is itself a little raster, which contains cell values that are used in 

the computation tor the output cell value. This little raster is a filter, also known 

as a kernel which may be square (such as a 3x3 kernel), but it does not have to 

be. The values in the filter are used as weight factors. 

As an example, let us consider a 3 x 3 cell filter, in which all values are equal 
to 1, as illustrated in Figure 6.24(a). The use of this filter means that the nine 
cells considered are given equal weight in the computation of the filtering step. 

Let the input raster cell values, for the current filtering step, be denoted by Vij 
and the corresponding filter values by Wij. The output value for the cell under 
consideration will be computed as the sum of the weighted input values divided 
by the sum of weights: 

i,3 iJ 


where one should observe that we divide by the sum of absolute weights. 

^Please refer to Chapter Five of Principles of Remote Sensing for a discussion of image-relafed 
filfer operations. 
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Figure 6.24: Moving win¬ 
dow rasters for fiitering. 
(a) raster for a reguiar av¬ 
eraging fiiter; (b) raster 
for an x-gradient fiiter; 
(c) raster for a y-gradient 
fiiter. 


Since the Wij are all equal to 1 in the case of Figure 6.24(a), the formula can be 
simplified to 


which is nothing but the average of the nine input raster cell values. So, we see 
that an 'all-1' filter computes a local average value, so its application amounts to 
moving window averaging. More advanced filters have been devised to extract 
other types of information from raster data. We will look at some of these in the 
context of slope computations. 
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Computation of slope angle and slope aspect 

A different choice of weight factors may provide other information. Special fil¬ 
ters exist to perform computations on the slope of the terrain. Before we look at 
these filters, let us define various notions of slope. 



Figure 6.25: Slope angle 
defined. Here, 5p stands 
for length in the horizon¬ 
tal plane, 5f stands for 
the change in field value, 
where the field usually is 
terrain elevation. The 
slope angle is a. 


Slope angle, which is also known as slope gradient, is the angle a, illustrated in 
Figure 6.25, between a path p in the horizontal plane and the sloping terrain. 
The path p must be chosen such that the angle a is maximal. A slope angle can 
be expressed as elevation gain in a percentage or as a geometric angle, in degrees 
or radians. The two respective formulas are: 


slope^perc = 100 ■ and slope^angle = arctanf-^). 

op op 


The path p must be chosen to provide the highest slope angle value, and thus 
it can lie in any direction. The compass direction, converted to an angle with 
the North, of this maximal down-slope path p is what we call the slope aspect. 
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Let us now look at how to compute slope angle and slope aspect in a raster 
environment. 

From an elevation raster, we caimot 'read' the slope angle or slope aspect di¬ 
rectly. Yet, that information can be extracted. After all, for an arbitrary cell, we 
have its elevation value, plus those of its eight neighbour cells. A simple ap¬ 
proach to slope angle computation is to make use of x-gradient and y-gradient 
filters. Figure 6.24(b) and (c) illustrate an x-gradient filter, and ^/-gradient filter, 
respectively. The x-gradient filter determines the slope increase ratio from wesf 

to east: if the elevation to the west of the centre cell is 1540 m and that to the x and y gradient filters 

east of the centre cell is 1552 m, then apparently along this transect the elevation 
increases 12 m per two cell widths, i.e. the x-gradient is 6 m per cell width. The 
y-gradient filter operates entirely analogously, though in south-north direction. 

Observe that both filters express elevation gain per cell width. This means that 
we must divide by the cell width—given in metres, for example—to obtain the 
(approximations to) the true derivatives Sf/6x and 6f /6y. Here, / stands for the 
elevation field as a function of x and y, and Sf /6x, for instance, is the elevation 
gain per unit of length in the x-direction. 

To obtain the real slope angle a along path p, observe that both the x- and y- 
gradient contribute to it. This is illustrated in Figure 6.26. A, not-so-simple, 
geometric derivation can show that always 


tan(a) = ^J{5f/5xY -F {df/dy^. 

Now what does this mean in the practice of computing local slope angles from 
an elevation raster? It means that we must perform the following steps: 
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1. Compute from (input) elevation raster R the non-normalized x- and y- 
gradients, using the filters of Figure 6.24(b) and (c), respectively. 

2. Normalize the resulting rasters by dividing by the cell width, expressed in 
units of length like metres. 

3. Use both rasters for generating a third raster, applying the a/^ formula 
above, possibly even applying an arctan function to the result to obtain the 
slope angle a for each cell. 


It can also be shown that for the slope aspect 'ip we have 


tarL('^) 


Sf/Sy' 



df/dx 


Figure 6.26; Slope angle 
and slope aspect defined. 
Here, p is the horizontal 
path in maximal slope di¬ 
rection and a is the slope 
angle. The plane tangent 
to the terrain in the origin 
is also indicated. The an¬ 
gle d is the slope aspect. 
See the text for further ex¬ 
planation. 
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so slope aspect can also be computed from the normalized gradients. We must 
warn the reader that this formula should not trivially be replaced by using 


V’ = arctarL( 


Sf/Sx 

Sf/Sy 


), 


the reason being that the latter formula does not account for southeast and 
southwest quadrants, nor for cases where Sf /6y = 0. (In the first situation, 
one must add 180° to the computed angle to obtain an angle measured from 
North; in the latter situation, ip equals either 90° or —90°, depending on the sign 

of 6f /5x.) 
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6.5 Network analysis 


A completely different set of analytical functions in GIS consists of computations 
on networks. A network is a coimected set of lines, representing some geographic 
phenomenon, typically of the transportation type. The 'goods' transported can 
be almost anything: people, cars and other vehicles along a road network, com¬ 
mercial goods along a logistic network, phone calls along a telephone network, 
or water pollution along a stream/river network. 

Network analysis can be performed on either raster or vector data layers, but 
they are more commonly done in the latter, as line features can be associated 
with a network, and hence can be assigned typical transportation characteristics 

such as capacity and cost per unit. A fundamental characteristic of any network Directed and undirected 
is whether the network lines are considered directed or not. Directed networks networks 

associate with each line a direction of transportation; undirected networks do not. 

In the latter, the 'goods' can be transported along a line in both directions. We 
discuss here vector network analysis, and assume that the network is a set of 
coimected line features that intersect only at the lines' nodes, not at internal ver¬ 
tices. (But we do mention under- and overpasses.) 

For many applications of network analysis, a planar network, i.e. one that can 
be embedded in a two-dimensional plane, will do the job. Many networks are 

naturally planar, like stream/river networks. A large-scale traffic network, on Planar networks 

the other end, is not planar: motorways have multi-level crossings and are con¬ 
structed with underpasses and overpasses. Planar networks are easier to deal 
with computationally, as they have simpler topological rules. 

Not all GISs accommodate non-planar networks, or can do so only using 'tricks'. 
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These may involve the splitting of overpassing lines at the intersection vertex 
and the creation of four lines out of the two original lines. Without further at¬ 
tention, the network will then allow one to make a turn onto another line at this 
new intersection node, which in reality would be impossible. In some GISs we 
can allocate a cost with turning at a node—see our discussion on turning costs 
below—and that cost, in the case of the overpass, can be made infinite to en¬ 
sure it is prohibited. But, as mentioned, this is a workaround to fit a non-planar 
situation into a data layer that presumes planarity. 

The above is a good illustration of geometry not fully determining the network's 
behaviour. Additional application-specific rules are usually required to define 
what can and caimot happen in the network. Most GISs provide rule-based 
tools that allow the definition of these extra application rules. 

Various classical spatial analysis functions on networks are supported by GIS 
software packages. The most important ones are: 

1. Optimal path finding which generates a least cost-path on a network be¬ 
tween a pair of predefined locations using both geometric and attribute 
data. 

2. Network partitioning which assigns network elements (nodes or line seg¬ 
ments) to different locations using predefined criteria. 

We discuss these two typical functions in the sections below. 
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Optimal path finding 


Optimal path finding techniques are used when a least-cost path between two 
nodes in a network must be found. The two nodes are called origin and desti¬ 
nation, respectively. The aim is to find a sequence of coimected lines to traverse 
from the origin to the destination at the lowest possible cost. 

The cost function can be simple: for instance, it can be defined as the total length 
of all lines on the path. The cost function can also be more elaborate and take into 
account not only length of the lines, but also their capacity, maximum transmis¬ 
sion (travel) rate and other line characteristics, for instance to obtain a reasonable 
approximation of travel time. There can even be cases in which the nodes visited 

add to the cost of the path as well. These may be called turning costs, which are Turning costs 

defined in a separate turning cost table for each node, indicating the cost of turn¬ 
ing at the node when entering from one line and continuing on another. This is 
illustrated in Figure 6.27. 
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Figure 6.27: Network 

neighbourhood of node 
N with associated turning 
costs at N. Turning at 
N onto c is prohibited 
because of direction, so 
no costs are mentioned for 
turning onto c. A turning 
cost of infinity (oo) means 
that it is also prohibited. 


The attentive reader will notice that it is possible to travel on line b in Figure 6.27, 
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then take a U-tum at node N, and return along a to where one came from. The 
question is whether doing this makes sense in optimal path finding. After all, 
to go back to where one comes from will only increase the total cost. In fact, 
there are situations where it is optimal to do so. Suppose it is node M that 
is coimected by line b with node N, and that we actually wanted to travel to 
another node L from M. The turn at M towards node L coming via another line 
may be prohibitively expensive, whereas turning towards L at M returning to 
M along b may not be so expensive. 

Problems related to optimal path finding are ordered optimal path finding and 
unordered optimal path finding. Both have an extra requirement that a num¬ 
ber of additional nodes needs to be visited along the path. In ordered optimal 

path finding, the sequence in which these extra nodes are visited matters; in Ordered and unordered path 
unordered optimal path finding it does not. An illustration of both types is pro- finding 

vided in Figure 6.28. Here, a path is found from node A to node D, visiting nodes 
B and C. Obviously, the length of the path found under non-ordered require¬ 
ments is at most as long as the one found under ordered requirements. Some 
GISs provide support for these more complicated path finding problems. 
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(b) 


Figure 6.28: Ordered (a) 
and unordered (b) opti¬ 
mal path finding. In both 
cases, a path had to be 
found from A to D, in (a) 
by visiting B and then C, 
in (b) both nodes also but 
in arbitrary order. 
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Network partitioning 

In network partitioning, the purpose is to assign lines and/or nodes of the net¬ 
work, in a mutually exclusive way, to a number of target locations. Typically, 

the target locations play the role of service centre for the network. This may be Service areas 

any type of service: medical treatment, education, water supply. This type of 
network partitioning is known as a network allocation problem. 

Another problem is trace analysis. Here, one wants to determine that part of the 

network that is upstream (or downstream) from a given target location. Such Connectivity 

problems exist in pollution tracing along river/stream systems, but also in net¬ 
work failure chasing in energy distribution networks. 


Network allocation In network allocation, we have a number of target loca¬ 
tions that function as resource centres, and the problem is which part of the net¬ 
work to exclusively assign to which service centre. This may sound like a simple 
allocation problem, in which a service centre is assigned those line (segments) 
to which it is nearest, but usually the problem statement is more complicated. 
These further complications stem from the requirements to take into account 


• The capacity with which a centre can produce the resources (whether they 
are medical operations, school pupil positions, kilowatts, or bottles of milk), 
and 

• The consumption of the resources, which may vary amongst lines or line seg¬ 
ments. After all, some streets have more accidents, more children who 
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live there, more industry in high demand of electricity or just more thirsty 
workers. 



coimected part of the network. Various techniques exist to assign network lines, 
or their segments, to a centre. In Figure 6.29(a), the green star indicates a pri¬ 
mary school and the GIS has been used to assign streets and street segments 
to the closest school within 2 km distance, along the network. Then, using de¬ 
mographic figures of pupils living along the streets, it was determined that too 
many potential pupils lived in the area for the school's capacity. So in part (b), 
the already selected part of the network was reduced to accommodate precisely 
the school's pupil capacity for the new year. 


Figure 6.29: Network al¬ 
location on a pupil/school 
assignment problem. In 
(a), the street segments 
within 2 km of the school 
are identified; in (b), the 
selection of (a) is further 
restricted to accommodate 
the school’s capacity for 
the new year. 


Trace analysis Trace analysis is performed when we want to understand which 
part of a network is 'conditionally coimected' to a chosen node on the network. 
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known as the trace origin. For a node or line to be conditionally connected, it 
means that a path exists from the node/line to the trace origin, and that the 

coimecting path fulfills the conditions set. What these conditions are depends Tracing requires connectivity 
on the application, and they may involve direction of the path, capacity, length, 
or resource consumption along it. The condition typically is a logical expression, 
as we have seen before, for instance: 


• The path must be directed from the node/line to the trace origin, 

• Its capacity (defined as the minimum capacity of the lines that constitute 
the path) must be above a given threshold, and 

• The path's length must not exceed a given maximum length. 


Tracing is the computation that the GIS performs to find the paths from the trace 
origin that obey the tracing conditions. It is a rather useful function for many 
network-related problems. 



Figure 6.30: Tracing 

functions on a network: 

(a) tracing upstream, 

(b) tracing downstream, 

(c) tracing without condi¬ 
tions on direction. 
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In Figure 6.30 our trace origin is indicated in red. In part (a), the tracing condi¬ 
tions were set to trace all the way upstream; part (b) traces all the way down¬ 
stream, and in part (c) there are no conditions on direction of the path, thereby Upstream and downstream 
tracing all coimected lines from the trace origin. More complex conditions are tracing 

certainly possible in tracing. 
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6.6 GIS and application models 


We have discussed the notion that real world processes are often highly complex. 
Models are simplified absfractions of realify representing or describing its most 
important elements and their interactions. Modelling and GIS are more or less 
inseparable, as GIS is itself a fool for modelling 'the real world' (or al least some 
part of it). 

The solution to a (spatial) problem usually depends on a (large) number of pa- 
ramefers. Since these parameters are often interrelated, their interaction is made 
more precise in an application model. 

Here we define application models fo include any kind of GIS based model (in¬ 
cluding so-called analytical and process models) for a specific real-world appli¬ 
cation. Such a model, in one way or ofher, describes as faithfully as possible 
how the relevant geographic phenomena behave, and it does so in terms of fhe 
parameters. 

The nature of application models varies enormously. GIS applications for famine 
relief programs, for insfance, are very different from earfhquake risk assessment 
applications, though both can make use of GIS fo derive a solution. Many kinds 
of application models exist, and they can be classified in many different ways. 
Here we identify five characteristics of GIS-based application models: 

1. The purpose of the model, 

2. The methodology underlying the model, 

3. The scale at which the model works. 
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4. Its dimensionality - i.e. whether the model includes spatial, temporal or spa- Model characteristics 

tial and temporal dimensions, and 

5. Its implementation logic - i.e. the extent to which the model uses existing 
knowledge about the implementation context. 

It is important to note that the categories above are merely dijferent characteristics 
of any given application model. Any model can be described according to these 
characteristics. Each is briefly discussed below. 

Purpose of the model refers to whether the model is descriptive, prescriptive 
or predictive in nature. Descriptive models attempt to answer the "what is" - 
question. Prescriptive models usually answer the "what should be" question by 
determining the best solution from a given set of conditions. 

Models for plarming and site selection are usually prescriptive, in that they 
quantify environmental, economic and social factors to determine 'best' or op¬ 
timal locations. So-called Predictive models focus upon the "what is likely to be" Predictive modeis 

questions, and predict outcomes based upon a set of input conditions. Exam¬ 
ples of predictive models include forecasting models, such as those attempting 
to predict landslides or sea-level rise. 

Methodology refers to the operational components of the model. Stochastic 
models use statistical or probability functions to represent random or semi-ran¬ 
dom behaviour of phenomena. In contrast, deterministic models are based upon Inner workings of the modei 
a well-defined cause and effecf relationship. Examples of deterministic models 


Descriptive and prescriptive 
modeis 
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include hydrological flow and pollution models, where the 'effect' can often be 
described by numerical methods and differential equations. 

Rule-based models attempt to model processes by using local (spatial) rules. Cel¬ 
lular Automata (CA) are examples of models in this category. These are often 
used to understand systems which are generally not well understood, but for 
which their local processes are well known. For example, the characteristics of 
neighbourhood cells (such as wind direction and vegetation type) in a raster- 
based CA model might be used to model the direction of spread of a fire over 
several time steps. 

Agent-based models (ABM) attempt to model movement and development of mul¬ 
tiple interacting agents (which might represent individuals), often using sets of 
decision-rules about what the agent can and caimot do. Complex agent-based 
models have been developed to understand aspects of travel behaviour and 
crowd interactions which also incorporate stochastic components. 


Scale refers fo whether the components of the model are individual or aggre¬ 
gate in nature. Essentially this refers fo the 'level' at which the model operates. 

Individual-based models are based on individual entities, such as the agent-based 

models described above, whereas aggregate models deal with 'grouped' data. Individual and aggregate 
such as population census data. Aggregate models may operate on data at the models 

level of a city block (for example, using population census data for particular 
social groups), at the regional, or even at a global scale. 


Dimensionality is the term chosen to refer to whether a model is static or dy¬ 
namic, and spatial or aspatial. Some models are explicitly spatial, meaning they 
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operate in some geographically defined space. Some models are aspatial, mean¬ 
ing they have no direct spatial reference. 

Models can also be static, meaning they do not incorporate a notion of time or 
change. In dynamic models, time is an essential parameter (see Section 2.5. Dy¬ 
namic models include various types of models referred to as process models or 

simulations. These types of models aim fo generate future scenarios from ex- Static and dynamic modeis 

isting scenarios, and might include deterministic or stochastic components, or 

some kind of local rule (for example, to drive a simulation of urban growth and 

spread). The fire spread example given above is a good example of an explicifly 

spatial, dynamic model which might incorporate both local rules and stochastic 

components. 


Implementation logic refers to how the model uses existing theory or knowl¬ 
edge to create new knowledge. Deductive approaches use knowledge of the over¬ 
all situation in order to predict outcome conditions. This includes models that 
have some kind of formalized set of criteria, often with known weightings for 

the inputs, and existing algorithms are used to derive outcomes. Inductive ap- Inductive and deductive 
proaches, on the other hand, are less straightforward, in that they try to gener- approaches 

alize (often based upon samples of a specific data set) in order to derive more 
general models. While an inductive approach is useful if we do not know the 
general conditions or rules which apply in a given domain, it is typically a trial- 
and-error approach which requires empirical testing to determine the parame¬ 
ters of each input variable. 

Most GIS only come equipped with a limited range of tools for modelling. For 
complex models, or functions which are not natively supported in our GIS, exter- 
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nal software environments are frequently used. In some cases, GIS and models 
can be fully integrated (known as embedded coupling) or linked through data and 
interface (known as tight coupling). If neither of these is possible, the external 
model might be run independently of our GIS, and the output exported from 
our model into the GIS for further analysis and visualization. This is known as 
loose coupling. 

It is important to compare our model results with previous experiments and to 
examine the possible causes of inconsistency between the output of our models 
and the expected results. The following section discusses these aspects further. 
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6.7 Error propagation in spatial data processing 
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6.7.1 How errors propagate 


In Section 5.2, we discussed a number of sources of error that may be present 
in source data. It is important to note that the acquisition of base dafa to a high 
standard of qualify sfill does not guarantee that the results of furfher, complex 
processing can be freafed wifh certainly As the number of processing steps 

increases, it becomes difficult to predict the behaviour of error propagation. These Combined error from 

various errors may affecf the outcome of spatial dafa manipulations. In addition, individual sources 

further errors may be introduced during the various processing steps discussed 
earlier in this chapter, as illustrated in Figure 6.31. 
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Figure 6.31: Error propa¬ 
gation in spatial data han¬ 
dling 


One of the most commonly applied operations in geographic information sys¬ 
tems is analysis by overlaying two or more spatial data layers. As discussed 
above, each such layer will contain errors, due to both inherent inaccuracies in 
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the source data and errors arising from some form of computer processing, for 
example, rasterization. During the process of spatial overlay, all the errors in the 
individual data layers contribute to the final error of the output. The amount of 
error in the output depends on the type of overlay operation applied. For exam¬ 
ple, errors in the results of overlay using the logical operator AND are not the 
same as those created using the OR operator. 

Table 6.2 lists common sources of error introduced into GIS analyses. Note that 
these are from a wide range of sources, and include various common tasks relat¬ 
ing to both data preparation and data analysis. It is the combination of different 
errors that are generated at each stage of preparation and analysis which may 
bring about various errors and uncertainties in the eventual outputs. 

Consider another example. A land use plaiming agency is faced with the prob¬ 
lem of identifying areas of agricultural land that are highly susceptible to ero¬ 
sion. Such areas occur on steep slopes in areas of high rainfall. The spatial data 
used in a GIS to obtain this information might include: 


• A land use map produced five years previously from 1 : 25,000 scale aerial 
phofographs, 

• A DEM produced by interpolating contours from a 1 : 50, 000 scale topo¬ 
graphic map, and 

• Annual rainfall statistics collected at two rainfall gauges. 


The reader is invited to assess what sort of errors are likely fo occur in this anal¬ 
ysis. 
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Referring back to Figure 6.31, the reader is also encouraged to reflect on errors in¬ 
troduced in components of application models discussed in the previous section. 
Specifically, the methodological aspects of representing geographic phenomena. 
What might be the consequences of using a random function in an urban trans¬ 
portation model (when, in fact, travel behaviour is not purely random)? 
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Coordinate adjustments 

Generaiization 

rubber sheeting/transformations 

linear alignment 

projection changes 

line simplification 

datum conversions 

addition/deletion of vertices 

rescaling 

linear displacement 

Feature Editing 

Raster/Vector Conversions 

line snapping 

raster cells to polygons 

extension of lines to intersection 

polygons to raster cells 

reshaping 

assignment of point attributes 

moving/copying 

to raster cells 

elimination of spurious polygons 

post-scanner line thinning 

Attribute editing 

Data input and Management 

numeric calculation and change 

digitizing 

text value changes/substitution 

scanning 

re-definition of attributes 

topological construction / spatial indexing 

attribute value update 

dissolving polygons with same attributes 

Booiean Operations 

Surface modeiiing 

polygon on polygon 

contour/lattice generation 

polygon on line 

TIN formation 

polygon on point 

Draping of data sets 

line on line 

Cross-section/profile generation 

overlay and erase/update 

Slope/aspect determination 

Dispiay and Anaiysis 

Dispiay and Anaiysis 

cluster analysis 

class intervals choice 

calculation of surface lengths 

areal interpolation 

shortest route/path computation 

perimeter/area size/volume computation 

buffer creation 

distance computation 

display and query 

spatial statistics 

adjacency/contiguity 

label/text placement 


Table 6.2: Some of the 
most common causes of 
error in spatial data han¬ 
dling. Source: Hunter & 
Beard [23]. 
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6.7.2 Quantifying error propagation 

Chrisman [13] noted that "the ultimate arbiter of cartographic error is the real 

world, not a mathematical formulation". It is an unavoidable fact that we will Errors are unavoidable 
never be able to capture and represent everything that happens in the real world 
perfectly in a GIS. Hence there is much to recommend the use of testing proce¬ 
dures for accuracy assessmenf. 

Various perspectives, motives and approaches fo dealing with uncertainty have 
given rise to a wide range of conceptual models and indices for fhe description 
and measuremenf of error in spafial dafa. All these approaches have their ori¬ 
gins in academic research and have strong theoretical bases in mathematics and 
statistics. Here we identify two main approaches for assessing fhe nature and 
amount of error propagation: 

1. Testing the accuracy of each sfate by measurement against the real world, and 

2. Modelling error propagation, either analytically or by means of simulation 
techniques. 


Modelling of error propagation has been defined by Veregin [56] as: "fhe ap¬ 
plication of formal mathematical models that describe the mechanisms whereby 

errors in source data layers are modified by particular dafa fransformation op- Modelling error vs. 

erafions." In other words, we would like to know how errors in the source data rnodelling error propagation 
behave under manipulations that we subject them to in a GIS. If we are able 
fo quantify fhe error in the source data as well as their behaviour under GIS 
manipulations, we have a means of judging the uncertainty of the results. 
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Error propagation models are very complex and valid only for certain data types 
(e.g. numerical attributes). Initially, they described only the propagation of at¬ 
tribute error [21, 56]. More recent research has addressed the spatial aspects of 

error propagation and the development of models incorporating both attribute Attribute and locational 
and locational components. These topics are outside the scope of this book, and components 

readers are referred to [2, 27] for more detailed discussions. Rather than explic¬ 
itly modelling error propagation, is often more practical to test the results of each 
step in the process against some independently measured reference data. 
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Summary 


This chapter has examined various ways of manipulating both raster and vector 
based spatial data sets. It is certainly true that some types of manipulations are 
better accommodated in one, and not so well in the other. Usually, one chooses 
the format to work with on the basis of many more paramefers, including the 
availability of source data. 

We have identified several classes of data manipulations or functions. The first 
of these does not generate new spatial data, but rather extracts—i.e. 'makes 
visible'—information from existing data sets. Amongst these are the measure¬ 
ment functions. These allow us to determine scalar values such as length, dis¬ 
tance, and area size of selected features. Spatial selections allow us to selectively 
identify features on the basis of conditions, which may be spatial in character. 

A second class of spatial data manipulations generates new spatial data sets. 
Classification functions assign a new characteristic value to each feature in a set 
of (previously selected) features. Spatial overlay functions go a step further and 
combine two spatial data sets by location. What is produced as an output spa¬ 
tial data set depends on user requirements, and the data format with which one 
works. Most of the vector spatial overlays are based on polygon/polygon inter¬ 
section, or polygon/line intersections. In the raster domain, we have seen the 
powerful tool of raster calculus, which allows all sorts of spatial overlay condi¬ 
tions and output expressions, all based on cell by cell comparisons and compu¬ 
tations. 

Going beyond spatial overlays are the neighbourhood functions. Their principle 
is not 'equal location comparison' but they instead focus on the definition of the 
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vicinity of one or more features. This is useful for applications that attempt to 
assess the effect of some phenomenon on its environment. The simplest neigh¬ 
bourhood functions are insensitive to direction, i.e. will deal with all directions 
equally. Good examples are buffer computations on vector data. More advanced 
neighbourhood functions take into account local context, and therefore are sen¬ 
sitive to direction. Since such local factors are more easily represented in raster 
data, this is then the preferred format. Flow and diffusion functions are exam¬ 
ples. 

We also looked at a special type of spatial data, namely (line) networks, and the 
functions that are used on these. Optimal path finding is one such function, use¬ 
ful in routing problems. The use of this function can be constrained or uncon¬ 
strained. Another function often used on networks is network partitioning: how 
to assign respective parts of the network to resource locations. 

Various combinations of the analytical functions discussed above can be used 
in an application model to simulate a given geographical process or phenomenon. 
The output generated by these models can then be used in various ways, includ¬ 
ing decision support and plarming. Many different kinds of models exist, and 
the type of model used will depend on the process or phenomena under study, 
the nature of the data, and the type of output desired from the model. 

The final section of this chapter discussed the issue of error propagation. It was 
noted that at each stage of working with spatial data, errors can be introduced 
which can propagate through the different operations. These errors can range 
from simple mistakes in data entry through to inappropriate estimation tech¬ 
niques or functions in operational models, and can serve to degrade the 'end 
result' of our analyses significantly. 
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Questions 

1. On page 352, we discussed the measurement function of distance between 
vector features. Draw six diagrams, each of which contains two arbitrary 
vector features, being either a point, a polyline, or a polygon. Then, in¬ 
dicate the minimal distance, and provide a short description of how this 
could have been computed. 

2. On page 352, we mentioned that two polygons can only intersect when 
their minimal bounding boxes overlap. Provide a counter-example of the 
inverted statement, in other words, show that if their minimal bounding 
boxes overlap, the two polygons may still not intersect (or meet, or have 
one contained in the other). 

3. In Figure 6.11 we provided an example of automafic classification. Rework 
the example and show what the results would be for three (instead of five) 
classes, both with equal interval classification and equal frequency classi¬ 
fication. 

4. In Figure 6.9, we provided a classification of average household income 
per ward in the city of Dar es Salaam. Provide a (spatial) interpretation of 
that figure. 

5. Observe that the equal frequency technique applied on the raster of Figure 6.11 
does not really produce categories with equal frequencies. Explain why 
this is. Would we expect a better result if our raster had been 5,000 x 5,000 
cells? 
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6. When discussing vector overlay operators, we observed that the one fun¬ 
damental operator was polygon intersection, and that other operators were 
expressible in terms of it. The example we gave showed this for poly¬ 
gon overwrite. Draw up a series of sketches that illustrates the procedure. 
Then, devise a technique of how polygon clipping can be expressed and 
illustrate this too. 

7. Argue why diffusion computations are much more naturally supported by 
raster data than by vector data. 

8. In Figure 6.22(b), each cell was assigned the minimum total resistance of 
a path from the source location to that cell. Verify the two values of 14.50 
and 14.95 of the top left cells by doing the necessary computations. 

9. In Figure 6.23, we illustrated drainage pattern computations on the basis 
of an elevation raster. Pick two arbitrary cells, and determine how water 
from those cells will flow through the area described by the raster. Which 
raster cell can be called the 'water sink' of the area? 

10. In Section 6.4.4, we have more or less tacitly assumed throughout to be 
operating on elevation rasters. All the techniques discussed, however, ap¬ 
ply equally well to other continuous field rasters, for instance, for NDVI, 
population density, or groundwater salinity. Explain what slope angle and 
slope aspect computations mean for such fields. 
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7.1 GIS and maps 

There is a strong relationship between maps and GIS. More specifically, maps 
can be used as input for a GIS. They play a key role in relation to all the func¬ 
tional components of a GIS shown in Figure 3.1. 

As soon as a question contains a "where?" question, a map can often be the most 
suitable tool to solve the question and provide the answer. "Where do I find 
Enschede?" and "Where did ITC's students come from?" are both examples. 
Of course, the answers could be in non-map form like "in the Netherlands" or 
"from all over the world." These answers could be satisfying, however, they do 
not give the full picture. 

A map would put these answers in a spatial context. It could show where in 
the Netherlands Enschede is to be found and where it is located with respect to 
Schiphol-Amsterdam airport, where most students arrive. A world map would 
refine the answer "from all over the world," since it reveals that most students 
arrive from Africa and Asia, and only a few come from the Americas, Australia 
and Europe as can be seen in Figure 7.1. 

As soon as the location of geographic objects ("where?") is involved, a map be¬ 
comes useful. However, maps can do more than just providing information on 
location. They can also inform about the thematic attributes of the geographic 
objects located in the map. An example would be "What is the predominant land 
use in southeast Twente?" The answer could, again, just be verbal and state "Ur¬ 
ban." However, such an answer does not reveal patterns. In Figure 7.2, a dom¬ 
inant northwest-southeast urban buffer can be clearly distinguished. Maps can 
answer the "What?" question only in relation to location (the map as a reference 
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frame). 

A third type ot question that can be answered from maps is related to "When?" 

For instance, "When did the Netherlands have its longest coastline?" The an¬ 
swer might be "1600," and this will probably be satisfactory to most people. “When?” 

However, it might be interesting to see how this changed over the years. A set 
of maps could provide the answer as demonstrated in Figure 7.3. 

To summarize, maps can deal with questions/answers related to the basic com¬ 
ponents of spatial or geographic data: location (geometry), characteristics (the¬ 
matic attributes) and time, and their combination. 

As such, maps are the most efficient and effective means to transfer spatial in¬ 
formation. The map user can locate geographic objects, while the shape and 



Figure 7.1: Maps and 
location—“Where did 

ITC cartography students 
come from?” Map scale is 

1 : 200 , 000 , 000 . 
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Figure 7.2: Maps and 
characteristics—“What is 
the predominant iand use 
in southeast Twente?” 


colour of signs and symbols representing the objects inform about their charac¬ 
teristics. They reveal spatial relations and patterns, and offer the user insight in 
and overview of the distribution of particular phenomena. An additional char¬ 
acteristic of on-screen maps is that these are often interactive and have a link to 
a database, and as such allow for more complex queries. 

Looking at the maps above demonstrates an important quality of maps: the abil¬ 
ity to offer an abstraction of reality. A map simplifies by leaving out certain 
details, but at the same time it puts (when well-designed) the remaining infor- 
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Figure 7.3: 

Maps 

and 

time—“When 

did 

the 

Netherlands 

have 

its 

longest coastline?” 



mation in a clear perspective. The map in Figure 7.1 only needs the boundaries Simplification and 

of countries, and a symbol to represent the number of students per country. In abstraction from reality 
this particular case there is no need to show cities, mountains, rivers or other 
phenomena. 

This characteristic is well illustrated when one puts the map next to an aerial 
photograph or satellite image of the same area. Products like these give all in¬ 
formation observed by the capture devices used. Figure 7.4 shows an aerial 
photograph of the ITC building and a map of the same area. The photographs 
show all visible objects, including parked cars, and small temporary buildings. 

From the photograph, it becomes clear that the weather as well as the time of the 
day influenced its contents: the shadow to the north of the buildings obscures 
other information. The map on the other hand, only gives the outlines of build¬ 
ings and the streets in the surroundings. It is easier to interpret because of se¬ 
lection/ omission and classification of features. The symbolization chosen high¬ 
lights our building. Additional information, not available in the photograph. 
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has been added, such as the name of the major street: Hengelosestraat. Other 
non-visible data, like cadastral boundaries or even the sewerage system, could 
have been added in the same way. However, it also demonstrates that selection 
means interpretation, and there are subjective aspects to that. In certain circum¬ 
stances, a combination of photographs and map elements can be useful. 

There is a relationship befween the effectiveness of a map for a given purpose 
and the map's scale. The Public Works department of a city council caimot use 
a 1 : 250, 000 map for replacing broken sewer-pipes, and the map of Figure 7.1 
caimof be reproduced af scale 1 : 10,000. The map scale is the ratio between a 
distance on the map and the corresponding distance in reality. Maps that show 
much detail of a small area are called large-scale maps. The map in Figure 7.4 
displaying the surroundings of the ITC-building is an example. The world map 

in Figure 7.1 is a small-scale map. Scale indications on maps can be given verbally Map scale 



Figure 7.4: Comparing an 
aerial photograph (a) and 
a map (b). Source: Fig¬ 
ure 5-1 in [30]. 
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like 'one-inch-to-the-mile', or as a representative fraction like 1 : 200, 000, 000 
(1 cm on the map equals 200, 000, 000 cm (or 2, 000 km) in reality), or by a graphic 
representation like a scale bar as given in the map in Figure 7.4(b). The advan¬ 
tage of using scale bars in digital environments is that its length changes also 
when the map is zoomed in, or enlarged before printing.^ Sometimes it is nec¬ 
essary to convert maps from one scale to another, but this may lead to problems 
of (cartographic) generalization. 

Having discussed several characteristics of maps it is now necessary to provide 
a definition. Board [7] defines a map as 

"a represenfation or absfraction of geographic reality. A tool for pre¬ 
senting geographic information in a way that is visual, digital or tac¬ 
tile." 

The first sentence in this definition holds three key words. The "geographic 
reality" represents the object of study, our world. "Representation" and "ab¬ 
straction" refer to models of these geographic phenomena. The second sentence 
reflects the appearance of the map. Can we see or touch it, or is it stored in a 
database. In other words, a map is a reduced and simplified representation of 
(parts of) the Earth's surface on a plane. 

Traditionally, maps are divided into topographic maps and thematic maps. A to¬ 
pographic map visualizes, limited by its scale, the Earth's surface as accurafely 

as possible. This may include infrastructure (e.g. railroads and roads), land use Topographic maps 

(e.g. vegetation and built-up area), relief, hydrology, geographic names and a 

^This explains why many of the maps in this book do not show a map scale. 
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reference grid. Figure 7.5 shows a small scale topographic map (with text omit¬ 
ted) of Overijssel, the Dutch province in which Enschede is located. 

Thematic maps represent the distribution of particular themes. One can dis¬ 
tinguish between socio-economic themes and physical themes. The map in Fig¬ 
ure 7.6(a), showing population density in Overijssel, is an example of the first 
and the map in Figure 7.6(b), displaying the province's drainage areas, is an 

example of the second. As can be noted, both thematic maps also contain in- Thematic maps 

formation found in a topographic map, so as to provide a geographic reference 
to the theme represented. The amount of topographic information required de¬ 
pends on the map theme. In general, a physical map will need more topographic 
data than most socio-economic maps, which normally only need administrative 
boundaries. The map with drainage areas should have added rivers and canals, 
while adding relief would make sense as well. 

Today's digital environment has diminished the distinction between topographic 
and thematic maps. Often, both topographic and thematic maps are stored in the 
database as separate data layers. Each layer contains data on a particular topic, 
and the user is able to switch layers on or off at will. 

The design of topographic maps is mostly based on conventions, of which some 
date back several centuries. Examples are the use of blue to represent water, 

green for forests, red for major roads, and black to denote urban or built-up Cartographic grammar 

areas. The design of thematic maps, however, should be based on a set of car¬ 
tographic rules, also called cartographic grammar, which will be explained in Sec¬ 
tions 7.4 and 7.5 (but see also [32]). 

Suppose that one wants to quantify land use changes between 1990 and the cur¬ 
rent year. Two data sets (from 1990 and 2008) can be combined with an overlay 
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operation (see Section 6.3). The result of such a spatial analysis can be a spa¬ 
tial data layer from which a map can be produced to show the differences. The 
parameters used during the operation are based on models developed by the 
application at hand. It is easy to imagine that maps can play a role during this 
process of working with a GIS by showing intermediate and final results of the 
GIS operations. Glearly, maps are no longer only the final product they used to 
be. 

Maps can further be distinguished according to the dimensions of spatial data 
that are graphically represented. GIS users also try to solve problems that deal 
with three-dimensional reality or with change processes. This results in a de¬ 
mand for other than just two-dimensional maps to represent geographic reality. 

Three-dimensional and even four-dimensional (namely, including time) maps 

are then required. New visualization techniques for these demands have been Dimensionality 

developed. Figure 7.7 shows the dimensionality of geographic objects and their 
graphic representation. Part (a) provides a map of the ITG building and its 
surroundings, while part (b) shows a three-dimensional view of the building. 

Figure 7.7(c) shows the effect of change, as three moments in time during the 
construction of the building. 
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Figure 7.5: A topographic 
map of the province of 
Overijssei. Geographic 
names and a reference 
grid have been omitted for 
reasons of ciarity. 
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(b) 


Figure 7.6: Thematic 

maps: (a) socio-economic 
thematic map, showing 
popuiation density of 
the province of Overijs- 
sei (higher densities in 
darker tints); (b) physicai 
thematic map, show¬ 
ing watershed areas of 
Overijssei. 
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Figure 7.7: The dimen¬ 
sions of spatial data: 
(a) 2D, (b) 3D, (c) 3D with 
time. 
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7.2 The visualization process 


The characteristic of maps and their function in relation to the spatial data han¬ 
dling process was explained in the previous section. In this context the carto¬ 
graphic visualization process is considered to be the translation or conversion 
of spatial data from a database into graphics. These are predominantly map¬ 
like products. During the visualization process, cartographic methods and tech¬ 
niques are applied. These can be considered to form a kind of grammar that 
allows for the optimal design and production for the use of maps, depending on 
the application (see Figure 7.8). 




\ OVERIJSSEL 

spatial 

database 

visualisation process 

\ 

L_ J 




translation of spatial data into maps guided by 
“/-/ow to say what to whom, and is it effective?” 
applying cartographic methods and techniques 


Figure 7.8: The carto¬ 
graphic visualization pro¬ 
cess. Source: Figure 2-1 
in [30]. 


The producer of these visual products may be a professional cartographer, but 
may also be a discipline expert, for instance, mapping vegetation stands using 
remote sensing images, or health statistics in the slums of a city. To enable the 
translation from spatial data into graphics, we assume that the data are available 
and that the spatial database is well-structured. 
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The visualization process can vary greatly depending on where in the spatial 
data handling process it takes place and the purpose for which it is needed. 

Visualizations can be, and are, created during any phase of the spatial data han¬ 
dling process as indicated before. They can be simple or complex, while the 
production time can be short or long. 

Some examples are the creation of a full, traditional topographic map sheet, a 
newspaper map, a sketch map, a map from an electronic atlas, an animation 
showing the growth of a city, a three-dimensional view of a building or a moun¬ 
tain, or even a real-time map display of traffic conditions. Other examples in¬ 
clude 'quick and dirty' views of part of the database, the map used during the 

updating process or during a spatial analysis. However, visualization can also be Visualization purpose and 

used for checking the consistency of the acquisition process or even the database environment 

structure. These visualization examples from different phases in the process of 
spatial data handling demonstrate the need for an integrated approach to geoin¬ 
formatics. The environment in which the visualization process is executed can 
vary considerably. It can be done on a stand-alone personal computer, a network 
computer linked to an intranet, or on the World Wide Web (WWW/Internet). 

In any of the examples just given, as well as in the maps in this book, the visual¬ 
ization process is guided by the question "How do I say what to whom?" "How" 
refers to cartographic methods and techniques. "I" represents the cartographer 
or map maker, "say" deals with communicating in graphics the semantics of 
the spatial data. "What" refers to the spatial data and its characteristics, (for in¬ 
stance, whether they are of a qualitative or quantitative nature). "Whom" refers 
to the map audience and the purpose of the map—a map for scientists requires 
a different approach than a map on the same topic aimed at children. This will 
be elaborated upon in the following sections. 
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In the past, the cartographer was often solely responsible for the whole map 
compilation process. During this process, incomplete and uncertain data often 
still resulted in an authoritative map. The maps created by a cartographer had 
to be accepted by the user. Cartography, for a long time, was very much driven 
by supply rather than by demand. In some respects, this is still the case. How¬ 
ever, nowadays one accepts that just making maps is not the only purpose of 
carfography. 

The visualizafion process should also be fested on its effectiveness. To the propo¬ 
sition "How do I say what to whom" we have to add "and is it effective?" Based 
on feedback from map users, or knowledge about the effectiveness of carfo- 

graphic solufions, we can decide whether improvements are needed, and de- Effectiveness 

rive recommendations for future application of those solutions. In particular, 

with all the visualization options available, such as animated maps, multimedia 

and virtual reality, it remains necessary to test the effectiveness of cartographic 

methods and tools. 

The visualization process is always influenced by several factors. Some of these 
questions can be answered by just looking at the content of the spatial database: 

• What will be the scale of the map: large, small, other? This introduces the 
problem of generalization. Generalization addresses the meaningful reduc¬ 
tion of the map content during scale reduction. 

• Are we dealing with topographic or thematic data? These two categories 
traditionally resulted in different design approaches as was explained in 
the previous section. 

• More important for the design is the question of whether the data to be 
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represented are of a quantitative or qualitative nature. 

We should understand that the impact of these factors may increase, since the 
compilation of maps by spatial data handling is often the result of combining 
different data sets of different quality and from different data sources, collected 
at different scales and stored in different map projections. 

Cartographers have all kind of tools available to visualize the data. These tools 
consist of functions, rules and habits. Algorithms used to classify the data or 
to smooth a polyline are examples of functions. Rules tell us, for instance, to 

use proportional symbols to display absolute quantities or to position an ar- Cartographic rules 

tificial light source in the northwest to create a shaded relief map. Habits or 

conventions—or traditions as some would call them—tell us to colour the sea in 

blue, lowlands in green and mountains in brown. The efficiency of fhese fools 

will partly depend on the above-mentioned factors, and partly on what we are 

used to. 
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7.3 Visualization strategies: present or explore? 


Traditionally the cartographer's main task was the creation of good cartographic 
products. This is still true today. The main function of maps is to communicate 
geographic information, i.e. to inform the map user about location and nature of 

geographic phenomena and spatial patterns. This has been the map's function Visual communicattion 
throughout history. Well-trained cartographers are designing and producing 
maps, supported by a whole set of cartographic tools and theory as described 
in cartographic textbooks [50, 32]. 

During the last decades, many others have become involved in making maps. 

The widespread use of GIS has increased the number of maps tremendously 
[35]. Even the spreadsheet software used commonly in office today has mapping 

capabilities, although most users are not aware of this. Many of these maps Visual thinking process 

are not produced as final products, but rather as intermediaries to support the 

user in her/his work dealing with spatial data. The map has started to play a 

completely new role: it is not only a communication tool, but also has become 

an aid in the user's (visual) thinking process. 

This thinking process is accelerated by the continued developments in hard- 
and software. Media like DVD-ROMs and the WWW allow dynamic presentation 
and also user interaction. These went along with changing scientific and societal 
needs for georeferenced data and, as such, for maps. Users now expect imme¬ 
diate and real-time access to the data; data that have become abundant in many 

sectors of the geoinformation world. This abundance of data, seen as a 'par- Visual data mining 

adise' by some sectors, is a major problem in other sectors. We lack the tools for 
user-friendly queries and retrieval when studying the massive amount of (spa¬ 
tial) data produced by sensors, which is now available via the WWW. A new 
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branch of science is currently evolving to deal with this problem of abundance. 

In the geo-disciplines, it is called visual data mining. 

These developments have given the term visualization an enhanced meaning. 

According to the dictionary, it means 'to make visible' or 'to represent in graph¬ 
ical form'. It can be argued that, in the case of spatial data, this has always been 
the business of cartographers. However, progress in other disciplines has linked 
the word to more specific ways in which modern computer technology can fa¬ 
cilitate the process of 'making visible' in real time. Specific software toolboxes 

have been developed, and their functionality is based on two key words: in- Interaction and dynamics 
ter action and dynamics. A separate discipline, called scientific visualization, has 
developed around it [37], and has also had an important impact on cartography. 

It offers the user the possibility of instantaneously changing the appearance of 
a map. Interaction with the map will stimulate the user's thinking and will add 
a new function to the map. As well as communication, it will prompt thinking 
and decision-making. 

Developments in scientific visualization stimulated DiBiase [18] to define a model 
for map-based scientific visualization, also known as geovisualization. It covers 
both the presentation and exploration functions of the map (see Figure 7.9). Pre¬ 
sentation is described as 'public visual communication' since it concerns maps 
aimed at a wide audience. Exploration is defined as 'private visual thinking' 
because it is often an individual playing with the spatial data to determine its 
significance. It is obvious that presentation fits into the traditional realm of car¬ 
tography, where the cartographer works on known spatial data and creates com- Geovisualization 

municative maps. Such maps are often created for multiple use. Exploration, 
however, often involves a discipline expert who creates maps while dealing with 
unknown data. These maps are generally for a single purpose, expedient in the 
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expert's attempt to solve a problem. While dealing with the data, the expert 
should be able to rely on cartographic expertise, provided by the software or 
some other means. Essentially, here the problem of translation of spatial data 
into cartographic symbols also needs to be solved. 

The above trends all have to do with what has been called the 'democratization 
of cartography' by Morrison [40]. He explains it as 


exploration 



visualisation process 


private visual thinking 



presentation 


visualisation process 

known 

subsets 



public visual 
communication 



Figure 7.9: Private visual 
thinking and public visual 
communication. Source: 
Modified from Figure 2-2 
in [30]. 
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"using electronic technology, no longer does the map user depend 
on what the cartographer decides to put on a map. Today the user 
is the cartographer ... users are now able to produce analyses and 
visualizations at will to any accuracy standard that satisfies them." 

Exploration means to search for spatial, temporal or spatio-temporal patterns, 
relationships between patterns, or trends. In case of a search for patterns, a do¬ 
main expert may be interested in aspects like the distribution of a phenomenon, 
the occurrence of anomalies, the sequence of appearances and disappearances. 
A search for relationships between patterns could include: changes in vegeta¬ 
tion indices and climatic parameters, location of deprived urban areas and their 
distance to educational facilities. A search for trends could, for example, focus 
on the development in distribution and frequency of landslides. Maps not only 
enable these types of searches, findings may also trigger new questions, and lead 
to new visual exploration (or analytical) acts. 

What is unknown for one is not necessarily unknown to others. For instance, 
browsing in Microsoft's Encarta World Atlas CD-ROM is an exploration for most 
of us because of its wealth of information. With products like these, such explo¬ 
ration takes place within boundaries set by the producers. Cartographic knowl¬ 
edge is incorporated in the program, resulting in pre-designed maps. Some 
users may feel this to be a constraint, but the same users will probably no longer 
feel constrained as soon as they follow the web links attached to this electronic 
atlas. This shows that the data, the users, and the use environment influence 
one's view of what exploration entails. 

To create a map, one selects relevant geographic data and converts these into 
meaningful symbols for the map. Paper maps (in the past) had a dual function. 
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They acted as a database of the objects selected from reality, and communicated 
information about these geographic objects. The introduction of computer tech¬ 
nology, and databases in particular, has created a split between these two func¬ 
tions of the map. The database function is no longer required for the map, al¬ 
though each map can still function like it. The communicative function of maps 
has not changed. 

The sentence "How do I say what to whom, and is it effective?" guides the carto¬ 
graphic visualization process, and summarizes the cartographic communication 
principle. Especially when dealing with maps in the realm of presentation car¬ 
tography (Figure 7.9), it is important to adhere to the cartographic design rules. 
This is to guarantee that the resulting maps are easily understood by their users. 
How does this communication process work? Figure 7.10 forms an illustration. 
It starts with information to be mapped (the 'What' from the sentence). Before 
anything can be done, the cartographer should get a feel for the nature of the 
information, since this determines the graphical options. Cartographic infor¬ 
mation analysis provides this. Based on this knowledge, the cartographer can 
choose the correct symbols to represent the information in the map. S/he has a 
whole toolbox of visual variables available to match symbols with the nature of 
the data. For the rules, we refer to Section 7.4. 

In 1967, the French cartographer Bertin developed the basic concepts of the the¬ 
ory of map design, with his publication Semiologie Graphique [5]. He provided 
guidelines for making good maps. If ten professional cartographers were given 
the same mapping task, and each would apply Berlin's rules (see Section 7.4.2), 
this would still result in ten different maps. For instance, if the guidelines dictate 
the use of colour, it is not stated which colour should be used. Still, all ten maps 
could be of good quality. 
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WHAT 

^ ^ Informa 

tion EFFECTIVE 

information . 

analysis 

Info-retrieved 

\ 

1 Cartographer 

\ 

HOW 

Cartographic 

User 

^ WHOM 

^ Map 

SAY 


Figure 7.10: The carto¬ 
graphic communication 
process, based on “How 
do I say what to whom, 
and is it effective?” 
Source: Figure 5-5 in 
[30]. 
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Returning to the scheme, the map (the medium that does the 'say' in the sen¬ 
tence) is read by the map users (the 'whom' from the sentence). They extract 
some information from the map, represented by the box entitled 'Info-retrieved'. 
From the figure it becomes clear that the boxes with 'Information' and 'Info- 
retrieved' do not overlap. This means the information derived by the map user 
is not the same as the information that the cartographic communication process 
started with. There may be several causes. Possibly, the original information 
was not all used or additional information has been added during the process. 
Omission of information could be deliberately caused by the cartographer, with 
the aim of emphasizing the remaining information. Another possibility is that 
the map user did not fully undersfand the map. Information gained during the 
communication process could be due to the cartographer, who added extra in¬ 
formation to strengthen the already available information. It is also possible that 
the map user has some prior knowledge on the topic or area, which allows them 
to combine this prior knowledge with the knowledge retrieved from the map. 
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7.4 The cartographic toolbox 
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7.4.1 What kind of data do I have? 

To derive the proper symbology for a map one has to execute a cartographic 
data analysis. The core of this analysis process is to access the characteristics of 
the data to find ouf how they can be visualized, so that the map user properly 
interprets them. The first step in the analysis process is to find a common - 

denominator for all the data. This common denominator will then be used as the Cartographic data analysis 
title of the map. For instance, if all dafa are related fo land use, collected in 2005, 
the title could be Landuse of... 2005. Secondly, the individual component(s), such 
as landuse, and probably relief, should be analysed and their nature described. 

Later, these components should be visible in the map legend. 

We have already discussed different kinds of dafa values on page 75, in relation 
to the types of compufations we can do on them. Here we take a look at the 
different types of dafa in relation to how we might map or display them. 

Data will be of a qualitative or quantitative nafure. Qualitative data is also called 
nominal or categorical data. This data exists as discrete, named values without 
a natural order amongst the values. Examples are the different languages (e.g. 

English, Swahili, Dutch), the different soil types (e.g. sand, clay, peat) or the 
different land use categories (e.g. arable land, pasture). In the map, qualitative 
data are classified according fo disciplinary insighfs such as a soil classification 
system represented as basic geographic unifs: homogeneous areas associated 
with a single soil type, recognized by the soil classification. 

Quantitative data can be measured, either along an interval or ratio scale. For data 
measured on an interval scale, the exact distance between values is known, but 
there is no absolute zero on the scale. Temperature is an example: 40 °C is not 
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twice as warm as 20 °C, and 0 °C is not an absolute zero. Quantitative data with 
a ratio scale does have a known absolute zero. An example is income: someone 
earning $100 earns twice as much as someone with an income of $50. In order to 
generate maps, quantitative data are often classified into categories according to 
some mathematical method. 

In between qualitative and quantitative data, one can distinguish ordinal data. 
These data are measured along a relative scale, based on hierarchies. For in¬ 
stance, one knows that one value is 'more' than another value, such as 'warm' 
versus 'cooT. Another example is a hierarchy of road types: 'highway', 'main 
road', 'secondary road' and 'track'. The different types of data are summarized 
in Table 7.1. 


Measurement scale 

Nature of data 

Nominal, categorical 

Data of different nature / identity of 

Ordinal 

things (qualitative) 

Data with a clear element of order, 

Interval 

though not quantitatively determined 
(ordered) 

Quantitative information with arbitrary 


zero 

Ratio 

Quantitative data with absolute zero 


Table 7.1: Differences in 
the nature of data and their 
measurement scales 
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7.4.2 How can I map my data? 

Basic elements of a map, irrespective of the medium on which it is displayed, are 
point symbols, line symbols, area symbols, and text. The appearance of point, 
line, and area symbols can vary depending on their nature. Most maps in this 
book show symbols in different size, shape and colour. Points can vary in form 
or colour to represent the location of shops or they can vary in size to represent 
aggregated values (like number of inhabitants) for an administrative area. Lines 
can vary in colour to distinguish between administrative boundaries and rivers, 
or vary in shape to show the difference between railroads and roads. Areas 
follow the same principles: difference in colour distinguishes between different 
vegetation stands. 

Although the variations in symbol appearance are only limited by the imagina¬ 
tion they can be grouped together in a few categories. Bertin [5] distinguished 
six categories, which he called the visual variables and which may be applied to 
point, line and area symbols. As illustrated in Figure 7.11, they are: 

• Size, 

• Value (lightness), 

• Texture, 

• Colour, 

• Orientation and 

• Shape. 
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Figure 7.11: Berlin’s six 
visual variables illustrated. 
Source: Plate 1 in [31]. 
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These visual variables can be used to make one symbol different from another. 
In doing this, map makers in principle have free choice, provided they do not 
violate the rules of cartographic grammar. They do not have that choice when 
deciding where to locate the symbol in the map. The symbol should be located 
where features belong. Visual variables influence the map user's perception in 
different ways. What is perceived depends on the human capacity to see or 
perceive: 


• What is of equal importance (e.g. all red symbols represent danger), 

• Order (e.g. the population density varies from low fo high—represented 
by light and dark colour tints, respectively), 

• Quantities (e.g. symbols changing in size with small symbols for small 
amounts), or 

• An instant overview of the mapped theme. 


There is an obvious relationship between the nature of the data to be mapped 
and the 'perception properties' of visual variables. In Table 7.2, the measure¬ 
ment scales as defined in Table 7.1 are linked to the visual variables displayed in 
Figure 7.11. 'Dimensions of the plane' is added to the list of visual variables; it is 
the basis, used for the proper location of symbols on the plane (map). The per¬ 
ception properties of the remaining visual variables have been added. The next 
section discusses some typical mapping problems and demonstrates the above. 
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perception 

properties 

visual 

variables 

measurement scales 

nominal 

ordinal 

interval 

ratio 


dimensions 

X 

X 

X 

X 


of the plane 





order & quantities 

size 


X 

X 

X 


(grey) value 


X 

X 


order 







grain/texture 


X 

X 



colour hue 

X 




equal importance 

orientation 

X 





shape 

X 





Table 7.2: Measurement 
scales linked to visual vari¬ 
ables based on perception 
properties 
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7.5 How to map ... ? 

The subsections in this How to map ... section deal with characteristic mapping 
problems. We first describe a problem and briefly discuss a solution based on 
cartographic rules and guidelines. The need to follow these rules and guidelines 
is illustrated by some maps that have been wrongly designed, but are neverthe¬ 
less commonly found. 
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7.5.1 How to map qualitative data 

If, after a long fieldwork period, one has finally delineated the boundaries of a 
province's watersheds, one likely is interested in a map showing these areas. The 
geographic units in the map will have to represent the individual watersheds. In 
such a map, each of the watersheds should get equal attention, and none should 
stand out above the others. 



Figure 7.12: A good ex¬ 
ample of mapping qualita¬ 
tive data 


The application of colour would be the best solution since is has characteristics 
that allow one to quickly differentiate between different geographic units. How¬ 
ever, since none of the watersheds is more important than the others, the colours 
used have to be of equal visual weight or brightness. Figure 7.12 gives an exam¬ 
ple of a correct map. The readability is influenced by the number of displayed Readability 

geographic units. In this example, there are about 15. When this number is much 
higher, the map, at the scale displayed here, will become too cluttered. The map 
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can also be made by filling the watershed areas by different forms (like small 
circles, squares, triangles, etc.) in one colour (e.g. black for a monochrome map) 
—as an application of the visual variable shape. The amount of geographic units 
that can be displayed is then even more critical. 

Figure 7.13 shows two examples of how not to create such a map. In (a), several 
tints of black are used—as application of the visual variable Value'. Looking at 
the map may cause perceptual confusion since the map image suggests differ¬ 
ences in importance that are not there in reality. In Figure 7.13(b), colours are 
used instead. However, where most watersheds are represented in pastel tints, 
one of them stands out by its bright colour. This gives the map an unbalanced 
look. The viewer's eye will be distracted by the bright colours, resulting in an 
unjustified weaker attention for other areas. 



Figure 7.13: Two exam¬ 
ples of wrongly designed 
qualitative maps: (a) mis¬ 
use of tints of black; 
(b) misuse of bright 
colours 
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7.5.2 How to map quantitative data 

When, after executing a census, one would for instance like to create a map with 
the number of people living in each municipality, one deals with absolute quan¬ 
titative data. The geographic units will logically be the municipalities. The final 
map should allow the user to determine the amount per municipality and also 
offer an overview of the geographic distribution of the phenomenon. To reach 
this objective, the symbols used should have quantitative perception properties. 
Symbols varying in size fulfil this demand. Figure 7.14 shows the final map tor 
the province of Overijssel. 



Figure 7.14; Mapping ab¬ 
solute quantitative data 


The fact that it is easy to make errors can be seen in Figure 7.15. In 7.15(a), differ¬ 
ent tints of green (the visual variable Value') have been used to represent absolute 
population numbers. The reader might get a reasonable impression of the indi¬ 
vidual amounts but not of the actual geographic distribution of the population. 
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as the size of the geographic units will influence the perceptional properties too 

much. Imagine a small and a large unit having the same number of inhabitants. Misuse of colour 

The large unit would visually attract more attention, giving the impression there 

are more people than in the small unit. Another issue is that the population is 

not necessarily homogeneously distributed within the geographic units. Colour 

has also been misused in Figure 7.15(b). The applied four-colour scheme makes 

it is impossible to infer whether red represents more populated areas than blue. 

It is impossible to instantaneously answer a question like "Where do most peo¬ 
ple in Overijssel live?" 

On the basis of absolute population numbers per municipality and their geo¬ 
graphic size, we can also generate a map that shows population density per mu¬ 
nicipality. We then deal with relative quantitative data. The numbers now have 




Figure 7.15: Poorly de¬ 
signed maps displaying 
absolute quantitative data: 
(a) wrong use of green 
tints for absolute popula¬ 
tion figures; (b) incorrect 
use of colour 
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Figure 7.16: Mapping rel¬ 
ative quantitative data 

a clear relation with the area they represent. The geographic unit will again be 

municipality. The aim of the map is to give an overview of the distribution of Mapping relative quantities 
the population density. In the map of Figure 7.16, value has been used to dis¬ 
play the density from low (light tints) to high (dark tints). The map reader will 
automatically and in a glance associate the dark colours with high density and 
the light values with low density. 

Figure 7.17(a) shows the effect of incorrect application of the visual variable 
value. In this map, the value tints are out of sequence. The user has to go through 
quite some trouble to find out where in the province the high-density areas can 
be found. Why should mid-red represent areas with a higher population density 
than dark-red? 

In Figure 7.17(b) colour has been used in combination with value. The first im¬ 
pression of the map reader would be to think the brown areas represent the areas 
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with the highest density. A closer look at a legend would tell that this is not the 
case, and that those areas are represented by another colour that did not 'speak 
for itself'. 

If one studies the badly designed maps carefully, the information can be derived, 
in one way or another, but it would take quite some effort. Proper application 
of cartographic guidelines will guarantee that this will go much more smoothly 
(e.g. faster and with less chance of misunderstanding). 



Figure 7.17: Badly de¬ 
signed maps representing 
relative quantitative data: 
(a) lightness values used 
out of sequence; (b) colour 
should not be used 
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7.5.3 How to map the terrain elevation 

Terrain elevation can be mapped using different methods. Often, one will have 
collected an elevation data set for individual poinfs like peaks, or other char¬ 
acteristic points in the terrain. Obviously, one can map the individual points 
and add the height information as text. However, a contour map, in which the 
lines connect points of equal elevation, is generally used. To visually improve 
the information content of such a map the space between the contour lines can 
be filled with colour and value information following a convention, e.g. green 
for low elevation and brown for high elevation areas. This technique is known 
as hypsometric or layer tinting. Even more advanced is the addition of shaded 
relief. This will improve the impression of the three-dimensional relief (see Fig¬ 
ure 7.18). 

The shaded relief map uses the full three-dimensional information to create 
shading effects. This map, represented on a two-dimensional surface, can also be 
floated in three-dimensional space to give it a real three-dimensional appearance 

of a Virtual world', as shown in Figure 7.18(d). hooking at such a representation Three dimensional 

one can immediately imagine that it will not always be effective. Certain (low) appearance 

objects in the map will easily disappear behind other (higher) objects. 

Interactive functions are required to manipulate the map in three-dimensional 
space in order to look behind some objects. These manipulations include pan¬ 
ning, zooming, rotating and scaling. Scaling is needed, particularly along the 
z-axis, since some maps require small-scale elevation resolution, while others 

require large-scale resolution, i.e. vertical exaggeration. One can even imagine Manipulating 3D maps 

that other geographic, three-dimensional objects (for instance, the built-up area 
of a city and individual houses) have been placed on top of the terrain model. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















7 . 5 . How to map ...? 


478 


like it is done in Google Earth. Of course, one can also visualize objects below 
the surface in a similar way, but this is more difficult because the data to describe 
underground objects are sparsely available. 

Socio-economic data can also be viewed in three dimensions. This may result in 
dramatic images, which will be long remembered by the map user. Figure 7.19 
shows the absolute population figures of Overijssel in three dimensions. Instead 
of a proportionally sized circles to depict the number of people living in a munic¬ 
ipality (as we did in Figure 7.14) the proportional height of a municipality now 
indicates total population. The image clearly shows that Enschede (the large 
column in the lower right) is by far the highest populated municipality. 
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Figure 7.18: visualization 
of terrain elevation: 
(a) contour map; (b) map 
with layer tints; (c) shaded 
relief map; (d) 3D view of 
the terrain 
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Figure 7.19: Quantitative 
data visualized in three di¬ 
mensions 
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7.5.4 How to map time series 

Advances in spatial data handling have not only made the third dimension part 
of GIS routines. Nowadays, the handling of fime-dependent dafa is also parf 
of these routines. This has been caused by the increasing availability of data 
captured at different periods in time. Next to this data abundance, the GIS com¬ 
munity wants to analyse changes caused by real world processes. To that end, 
single time slice data are no longer sufficient, and the visualization of these pro¬ 
cesses caimot be supported with only static paper maps. 

Mapping time means mapping change. This may be change in a feature's geom¬ 
etry, in its attributes or both. Examples of changing geomefry are the evolving 
coastline of fhe Netherlands (as displayed in Figure 7.3), the location of Europe's 
national boundaries, or the position of weather fronts. The changes of a land par¬ 
cel's owner, landuse, or changes in road fraffic infensity are examples of chang¬ 
ing affribufes. Urban growfh is a combination of both. The urban boundaries 
expand and simultaneously the land use shifts from rural fo urban. If maps are 
to represent events like these, they should be suggestive of such change. 

This implies fhe use of symbols fhaf are perceived as representing change. Ex¬ 
amples of such symbols are arrows thaf have an origin and a desfinafion. They 
are used fo show movemenf and their size can be an indication of the magnitude 
of change. Size changes can also be applied to other point and line symbols to 
show increase and decrease over time. Specific point symbols such as 'crossed 
swords' (battle) or 'lightning' (riots) can be found fo represent dynamics in his¬ 
toric maps. Another alternative is the use of fhe visual variable value (expressed 
as finfs). In a map showing the development of a town, dark tints represent 
old built-up areas, while new built-up areas are represented by light tints (see 


Mapping changing 
phenomena 


Using symboiogy to 
represent change 
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Figure 7.20(a)). 

It is possible to distinguish between three temporal cartographic techniques (see 
Figure 7.20): 

1. Single static map: Specific graphic variables and symbols are used to indi¬ 
cate change or represent an event. Figure 7.20(a) applies the visual variable 
value to represent the age of the built-up areas; 

2. Series of static maps: A single map in the series represents a 'snapshot' in 
time. Together, the maps depict a process of change. Change is perceived 
by the succession of individual maps depicting the situation in successive 
snapshots. It could be said that the temporal sequence is represented by 
a spatial sequence, which the user has to follow, to perceive the temporal 
variation. The number of images should be limited since it is difficult for 
the human eye to follow long series of maps (Figure 7.20(b)); 

3. Animated map: Change is perceived to happen in a single image by display¬ 
ing several snapshots after each other just like a video cut with successive 
frames. The difference with the series of maps is that the variation can be 
deduced from real 'change' in the image itself, not from a spatial sequence 
(Figure 7.20(c)). 

For the user of a cartographic animation, it is important to have tools available 
that allow for interaction while viewing the animation. Seeing the animation 
play will often leave users with many questions about what they have seen. Just User interaction 

replaying the animation is not sufficient to answer questions like "What was the 
position of the coastline in the north during the 15th century?" 
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Most of the general software packages for viewing animations already offer fa¬ 
cilities such as 'pause' (to look at a particular frame) and '(fast-)forward' and 
'(fast-)backward', or step-by-step display. More options have to be added, such 
as a possibility to directly go to a certain frame based on a task like: 'Go to 1850'. 
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Figure 7.20: Mapping 

change; example of 
the urban growth of the 
city of Maastricht, The 
Netherlands: (a) single 
map, in which tints rep¬ 
resent age of the built-up 
area; (b) series of maps; 
(c) (simulation of an) 
animation. 
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7.6 Map cosmetics 

Most maps in this chapter are correct from a cartographic grammar perspective. 

However, many of them lack the additional information needed to be fully un¬ 
derstood that is usually placed in the margin of printed maps. Each map should 
have, next to the map image, a title, informing the user about the topic visual¬ 
ized. A legend is necessary to understand how the topic is depicted. Additional Fundamental requirements 
marginal information to be found on a map is a scale indicator, a north arrow for 
orientation, the map datum and map projection used, and some lineage informa¬ 
tion, (such as data sources, dates of data collection, methods used, etc.). 

Further information can be added that indicates when the map was issued, and 
by whom (author / publisher). All this information allows the user to obtain an 
impression of the quality of the map, and is comparable with metadata describ¬ 
ing the contents of a database or data layer. 

Figure 7.21 illustrates these map elements. On paper maps, these elements (if all 
relevant) have to appear next to the map face itself. Maps presented on screen of¬ 
ten go without marginal information, partly because of space constraints. How- Space constraints 

ever, on-screen maps are often interactive, and clicking on a map element may 
reveal additional information from the database. Fegends and titles are often 
available on demand as well. 

The map in Figure 7.21 is one of the first in this chapter that has text included. 

Figure 7.22 is another example. Text is used to transfer information in addition 
to the symbols used. This can be done by the application of the visual variables 

to the text as well. In Figure 7.22 an example can be found. Italics— cf. the visual Text 

variable of orientation—have been used for building names to distinguish them 
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from road names. Another common example is the use of colour to differentiate 
(at nominal level) between hydrographic names (in blue) and other names (in 
black). The text should also be placed in a proper position with respect to the 
object to which it refers. 

Maps constructed via the basic cartographic guidelines are not necessarily vi¬ 
sually appealing maps. Although well-constructed, they might still look sterile. 
The design aspect of creating appealing maps also has to be included in the vi¬ 
sualization process. 'Appealing' does not only mean having nice colours. One 
of the keywords here is contrast. Contrast will increase the communicative role 
of the map since it creates a hierarchy in the map contents, assuming that not all 
information has equal importance. This design trick is known as visual hierarchy 
or the figure-ground concept. The need tor visual hierarchy in a map is best un¬ 
derstood when looking at the map in Figure 7.23(a), which just shows lines. The 
map of the ITC building and surroundings in part (b) is an example of a map 
that has visual hierarchy applied. The first object to be noted will be the ITC 
building (the darkest patches in the map) followed by other buildings, with the 
road on a lower level and the parcels at the lowest level. 
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Figure 7.21: The paper 
map and its (marginal) in¬ 
formation. Source: Fig¬ 
ure 5-10 in [30]. 
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Figure 7.22; Text in the 
map 
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Figure 7.23: Visual hierar¬ 
chy and the location of the 
ITC building: (a) hierarchy 
not applied; (b) hierarchy 
applied 
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7.7 Map dissemination 


The map design will not only be influenced by the nature of the data to be 
mapped or the intended audience (the 'what' and 'whom' from "How do I say 
What to Whom, and is it Effective"), the output medium also plays a role. Tra¬ 
ditionally, maps were produced on paper, and many still are. 

Currently, most maps are presented on screen, for a quick view, for an internal 
presentation or for presentation on the WWW. Compared to maps on paper, on¬ 
screen maps have to be smaller, and therefore their contents should be carefully 
selected. This might seem a disadvantage, but presenting maps on-screen offers 

very interesting alternatives. In one of the previous paragraphs, we discussed On-screen maps 

that the legend only needs to be a mouse click away. A mouse click could also 

open the link to a database, and reveal much more information than a paper 

map could ever offer. Links to other than tabular or map data could also be 

made available. 

Maps and multimedia (photography, sound, video, animation) can be integrated. 

Some of today's electronic atlases, such as the Encarta World Atlas are good ex¬ 
amples of how multimedia elements can be integrated with the map. Pointing Multimedia maps 

to a country on a world map starts the national anthem of the country or shows 
its flag. It can be used to explore a country's language; moving the mouse would 
start a short sentence in the region's dialects. 

The World Wide Web is nowadays a common medium used to present and dis¬ 
seminate spatial data. Here, maps can play their traditional role, for instance 
to show the location of objects, or provide insight into spatial patterns, but be¬ 
cause of the nature of the internet, the map can also function as an interface 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















7.7. Map dissemination 


491 


to additional information. Geographic locations on the map can be linked to Maps as visual interfaces 

photographs, text, sound or other maps, perhaps even functions such as on-line 

booking services. Maps can also be used as 'previews' of spatial data products 

to be acquired through a spatial data clearinghouse that is part of a Spatial Data 

Infrastructure. For that purpose we can make use of geo-webservices which can 

provide interactive map views as intermediate between data and web browser 

(please refer to Section 3.2.3). 


web 

maps 





See also kartoweb.itc.nl/webcartography/webmaps/classification.htm 


Figure 7.24: 

Classification of maps 
on the WWW. Source: 
Figure 1-2 in [30]. 


How can maps be used on the WWW? We can distinguish several methods that 
differ in terms of necessary technical skills from both the user's and provider's 
perspective. The overview given here (see Figure 7.24) can only be a current 
state of affairs, since developments on the WWW are tremendously fast. An 
important distinction is the one between static and dynamic maps. Many static 
maps on the web are view-only. Organizations, such as map libraries or tourist 

information providers, often make their maps available in this way. This form Static maps 

of presentation can be very useful, for instance, to make historical maps more 

widely accessible. Static, view-only maps can also serve to give web surfers a 

preview of the products that are available from organizations, such as National 

Mapping Agencies. 

When static maps offer more than view-only functionality, they may present an 
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interactive view to the user by offering zooming, panning, or hyperlinking to 
other information. The much-used 'clickable map' is an example of the latter 
and is useful to serve as an interface to spatial data. Clicking on geographic 
objects may lead the user to quantitative data, photographs, sound or video or 
other information sources on the Web. The user may also interactively deter¬ 
mine the contents of the map, by choosing data layers, and even the visualiza¬ 
tion parameters, by choosing symbology and colours. Dynamic maps are about Dynamic maps 

change; change in one or more of the spatial data components. On the WWW, 
several options to play animations are available. The so-called animated-GIF 
can be seen as a view-only version of a dynamic map. A sequence of bitmaps, 
each representing a frame of an animation, are positioned one after another, and 
the WWW-browser will continuously repeat the animation. This can be used, 
for example, to show the change of weather over the last day. 

Slightly more interactive versions of this type of map are those to be played 
by media players, for instance those in QuickTime format, or as a Flash movie. 

Plug-ins to the web browser define the interaction options, which are often lim¬ 
ited to simple pause, backward and forward play. Such animations do not use 

any specific WWW-environment parameters and have equal functionality in the Interactive maps 

desktop-environment. The WWW also allows for the fully interactive presenta¬ 
tion of 3D models. The Virtual Reality Markup Language (VRML), for instance, 
can be used for this purpose. It stores a true 3D model of the objects, not just a 
series of 3D views. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















Summary 


493 


Summary 


Maps are the most efficient and effective means to inform us about spatial infor¬ 
mation. They locate geographic objects, while the shape and colour of signs and 
symbols representing the objects inform about their characteristics. They reveal 
spatial relations and patterns, and offer the user insight in and overview of the 
distribution of particular phenomena. An additional characteristic of particular 
on-screen maps is that they are often interactive and have a link to a database, 
and as such allow for more complicated queries. 

Maps are the result of the visualization process. Their design is guided by "How 
do I say what to whom and is it effective?" Executing this sentence will inform 
the map maker about the characteristics of the data to be mapped, as well as 
the purpose of the map. This is necessary to find the proper symbology. The 
purpose could be to present the data to a wide audience or to explore the data 
to obtain better understanding. Cartographers have all kind of tools available 
to create appropriate visualizations. These tools consist of functions, rules and 
habits, together called the cartographic grammar. 

This chapter has discussed some characteristic mapping problems from the per¬ 
spective of "How to map ..." First, the problem is described followed by a brief 
discussion of the potential solution based on cartographic rules and guidelines. 
The need to follow these rules and guidelines is illustrated by some maps that 
have been wrongly designed but are commonly found. The problems dealt with 
are "How to map qualitative data"—think of, for instance, soil or geological 
maps; "How to map quantitative data"—such as census data; "How to map the 
terrain"—dealing with relief, and informing about three-dimensional mapping 
options; "How to map time series"—such as urban growth presented in anima- 
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tions. Animations are well suited to display spatial change. 

Map design will not only depend on the nature of the data to be mapped or the 
intended audience but also on the output medium. Traditionally, maps were 
produced on paper, and many still are. Currently, most maps are presented 
on screen, for a quick view, for an internal presentation or for presentation on 
the WWW. Each output medium has its own specific design criteria. All maps 
should have an appealing design and have accessible a title, informing the user 
about the topic visualized. 
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Questions 

1. Suppose one has two maps, one at scale 1 : 10,000, and another at scale 
1:1,000,000. Which of the two maps can be called a large-scale map, and 
which a small-scale map? 



2. Describe the difference between a topographic map and a thematic map. 



3. Describe in one sentence, or in one question, the main problem of the car¬ 
tographic visualization process. 



4. Explain the content of Figure 7.8 in terms of that of Figure 3.1. 



5. Which four main types of thematic data can be distinguished on the basis 
of their measurement scales? 



6. Which are the six visual variables that allow to distinguish cartographic 
symbols from each other? 
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7. Describe a number of ways in which a three-dimensional terrain can be 
represented on a flat map display. 



8. On page 482, we discussed three techniques for mapping changes over 
time. We already discussed the issue of change detection, and illustrated it 
in Figure 2.25. What technique was used there? Elaborate on how appro¬ 
priate the two alternative techniques would have been in that example. 



9. Describe different techniques of cartographic output from the user's per¬ 
spective. 



10. Explain the difference between static maps and dynamic maps. 
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Abbreviations & Foreign words 

2D Two-dimensional. Typically applied to (aspects of) GIS applications 

that view their phenomena in a two-dimensional space (a plane), where 
coordinates are pairs {x,y). 

2|D Two-and-a-half-dimensional. Typically applied to (aspects of) GIS 
applications that view their phenomena in a two-dimensional space 
(a plane), where coordinates are pairs (x, y), but where some coor¬ 
dinates are associated also with a single elevation value This is 
different from 3D GIS because with any (x, y) coordinate pair, a 2^D 
system can at most associate only one elevation. A TIN structure, for 
instance, is a typical 2^D structure, as it only determines single eleva¬ 
tion values for single locations. 

3D Three-dimensional. Typically applied to (aspects of) GIS applications 

that view their phenomena in a three-dimensional space, where coor¬ 
dinates are triplets (x, y, z). 

ADSL Asymmetric Digital Subscriber Lines. A new technology of data trans¬ 
mission used to deliver high-rate digital data over existing ordinary 
phone-lines. ADSL facilitates the simultaneous use of normal tele¬ 
phone services, ISDN, and high speed data transmission, e.g. video. 

Arcinfo A GIS software package developed in the 1980's and 1990's at ESRI. 

As the name indicates ('Arc'), historically more vector-based than 
raster-based. 
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ASCII American Standard Code for Information Interchange; an encoding 
of text characters into integer values represented as bytes. So-called 
'plain text' files usually are encoded in ASCII. 

AVHRR Advanced Very High Resolution Radiometer; a broad-band scaimer, 
sensing in the visible, near-infrared, and thermal infrared portions 
of the electromagnetic spectrum, carried on NOAA's Polar Orbiting 
Environmental Satellites (POES). 

bps Bits per second. The unit in which data transmission rates are mea¬ 
sured. Eight bits constitute a byte, which is used to represent a single 
character in a text document. The usual unit is now Mbps: million bits 
per second. A data rate of 1Mbps allows to transmit about 40 pages 
of plain text per second. 

CIO Conventional International Origin. The mean position of the pole in 
the year 1903 (based on observations between 1900 and 1905) used to 
compensate for changes in position of the Earth's rotational axis over 
time (referred to as polar motion). 

DBMS Database Management System. 

DEM Digital Elevation Model. 

dpi Dots per inch; the unit of scaimer (or printer) resolution, expressed as 

how many pixels can be read (printed) per inch. 

DTM Digital Terrain Model. 

e.g. Eor example,; {exempli gratia). 
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ESRI Environmental Systems Research Institute, Inc. The American com¬ 
pany, based in Redlands, California, that created and develops Ar- 
cGIS. 

GDOP Geometric Dilution Of Precision. 

GIS Geographic Information System. 

GLONASS Global Navigation Satellite System. The American satellite-based 
positioning system. 

GMT Greenwich Mean Time. 

GPS Global Positioning System. 

GRS80 Geodetic Reference System 1980. 

HSDPA High-Speed Downlink Packet Access: A protocol for (fast) cellular 
phone data transmission. 

i.e. That is,; meaning,; {id est). 

ILWIS Integrated Land and Water Information System. A GIS software pack¬ 

age developed in the 1980's and 1990's at ITG. Historically more 
raster-based than vector-based. 

ISO International Organization for Standardization. 

ITRF International Terrestrial Reference Frame. 

ITRS International Terrestrial Reference System. 
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NDVI 

Normalized Difference Vegetation Index. 

NOAA 

National Oceanic Atmospheric Administration; an institute falling 
under the U.S. Department of Commerce which monitors the Earth's 
environment through satellite imagery. 

OGC 

Open Geospatial Consortium. 

SA 

Selective Availability. 

SDSS 

Spatial Decision Support System(s). 

SQL 

Structured Query Language; the query language implemented in all 
relational database management systems. 

SRF 

Spatial Reference Frame. 

SRS 

Spatial Reference System. 

SST 

Sea Surface Temperature; as used in examples of Chapter 1. 

TAI 

International Atomic Time. 

TIN 

Triangulated Irregular Network. 

UT 

Universal Time. 

UTC 

Coordinated Universal Time. 

viz. 

Namely,; (videlicet). 

WGS84 

World Geodetic System 1984. 
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WS Wind Speed; as used in examples of Chapter 1. 

WWW World-wide Web. In a broad sense, the global internet with all the 
information and services that can be found there. 
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Terms 

Agent-Based Model (ABM) These attempt to model processes in the form of 
multiple (possibly interacting) agents (which might represent indi¬ 
viduals) using sets of decision-rules about what the agent can and 
caimot do. As such, a key notion is that simple behavioral rules for 
individual agents generate complex behaviour for the entire 'system'. 
Agent-based models have been developed to understand aspects of 
complex systems, for example by incorporating stochastic and or de¬ 
terministic components. 

Algorithm A procedure used to solve a mathematical or computational prob¬ 
lem, or to address a data processing issue. Algorithms usually consist 
of a set of rules written in a computer language. 

Altitude The elevation of an object above a reference surface, usually mean sea 
level. 

Aspect The geographical direction toward which a slope faces, measured in 
degrees from north, in a clockwise direction. 

Attribute Data associated with a spatial feature or sample location, stored as a 
column in a database table. The name of the column should suggest 
what the values in that column stand for. These values are known as 
attribute values. 

Autocorreiation see 'spatial autocorrelation'. 

Azimuth In mapping and navigation, azimuth is the direction to a target with 
respect to north and usually expressed in degrees. 
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Bandwidth The range of frequencies in a radio signal. The wider, the more 
data can be carried. 

Base data Spatial data prepared for different uses. Typically, large-scale to¬ 
pographic data at the regional or national level, as prepared by a na¬ 
tional mapping organization. Sometimes also known as foundation 
data. 

Buffer Area surrounding a selected set of features. May be defined in terms 
of a fixed distance, or by a more complicated relationship that the 
features may have on their surroundings. 

Cartography The whole of scientific, technological and artistic activities di¬ 
rected to the conception, production, dissemination and use of map 
displays. 

Categorical data See 'Nominal data'. 

Centroid Informally, a geometric object's midpoint; more formally, can be de¬ 
fined as the centre of the object's mass, i.e. that point at which it would 
balance under a homogeneously applied force like gravity. 

Channel In satellite-based positioning, the circuitry of the receiver that allows 
to receive the signal of a single satellite. 

Check point An additional ground point used to independently verify the de¬ 
gree of accuracy of a geometric transformation (e.g. georeferencing, 
aerial triangulation). 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















Glossary 


512 


Clearinghouse Centralized repository, often forming part of a Spatial Data 
Infrastructure, where data users can 'shop' for spatial data. Clear¬ 
inghouses usually have an entrance through the world wide web re¬ 
ferred to as a 'web portal'. 

Clock bias In satellite-based positioning, the difference between a receiver's 
clock reading and that of the (largely synchronized) satellite clock(s). 

Concave A 2D polygon or 3D solid is said to be concave if there exists a 
straight line segment having its two end points in the object that does 
not lie entirely within the object. A terrain slope is concave, analo¬ 
gously, is concave if it (locally) has the shape of a concave solid. See 
also convex. 

Contour line An elevation isoline. Valuable especially in map production. 

Contour map Map in which contour lines are used to represent terrain eleva¬ 
tion. 

Control segment The worldwide network of satellite monitoring and control 
stations that ensure accuracy of satellite positions and clocks. 

Convex A 2D polygon or 3D solid is said to be convex if every straight line 
segment having its two end points in the object lies entirely within the 
object. A terrain slope is convex, analogously, is convex if it (locally) 
has the shape of a convex solid. See also concave. 

Database An integrated, usually large, collection of data stored with the help 
of a DBMS. 
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Database Management System A software package that allows its users to 
define and use databases. Commonly abbreviated to DBMS. A generic 
tool, applicable to many different databases. 

Database schema The design of a database laid down in definitions of the 
database's structure, integrity rules and operations. Stored also with 
the help of a DBMS. 

Delaunay triangulation A partitioning of the plane using a given set of points 
as the triangles' comers that is in a sense optimal. The optimality 
characteristic makes the resulting triangles come out as equilateral 
as possible. The circle going through the three corner points of any 
triangle will not contain other points of the input set. 

Deterministic (In the context of an application model), a procedure or function 
that generates an outcome with no allowance or consideration for 
variation. Deterministic models are good for predicting results when 
the input is predictable, and the exact functioning of the 'process' is 
known. The opposite of stochastic. 

Digital Elevation Model A representation of a surface in terms of elevation 
values that change with position. Elevation can refer to the ground 
surface, a soil layer, etc. According to the original definition data 
should be in a raster format. 

Digital Terrain Model (DTM). A digital representation of terrain relief in terms 
of (x, y, z) coordinates and possibly additional information (on break¬ 
lines and salient points). Usually, ^ stands for elevation, and (x, y) for 
the horizontal position of a point. To the concept of a DTM it does not 
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matter whether ^ is orthometric or ellipsoidal elevation. Horizontal 
position can be defined by geographic coordinates or by grid coordi¬ 
nates in a map projection. DTM data can be given in different forms 
(controur lines, raster, TIN, profiles, etc). 

Dilution of precision A factor of multiplication that (negatively) affects the 
ranging error in satellite-based positioning. It is caused by a non- 
optimal geometry of the satellites used for positioing in the receiver. 

Doppler aiding A technique that satellite positioning receivers use to improve 
their reception of satellite signals, and to improve the accuracy with 
which they determine velocity of the receiver, on the basis of a mea¬ 
sured Doppler effect. 

Doppler effect The change in frequency of a radio signal caused by the relative 
motion of the transmitter with respect to the receiver. 

Dynamic map (Also: cartographic animation); map with changing contents, 
and/or changing ways of representation of these contents, whether 
triggered by the user or not. 

Epoch (Precise) date and time. Used to register at what moment a measure¬ 
ment took place, in this book, the moment at which the measure¬ 
ments took place for fixing ('freezing') the positions of the fundamen¬ 
tal polyhedron of a spatial reference frame. 

Error matrix The matrix that compares samples taken from the data to be eval¬ 
uated with observations that are considered as correct (reference). 
The error matrix allows calculation of quality parameters such as 
overall accuracy, error of omission, and error of commission. 
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Euclidean space A space in which locations are identified by coordinates, and 
with which usually the standard, Pythagorean distance function be¬ 
tween locations is associated. Other functions, such as direction and 
angle, can also be present. Euclidean space is n-dimensional, and we 
must make a choice of n, being 1, 2, 3 or more. The case n = 2 gives 
us the Euclidean plane, which is the most common Euclidean space in 
GIS use. 

Evapotranspiration (Sometimes erroneously written as evapotransporation); 

the process by which surface water, soils, and plants release water 
vapour to the atmosphere through evaporation (surface water, solis) 
and transpiration (plants). 

Exploratory cartography Interactive cartographic visualization of not well- 
understood spatial data by an individual to stimulate visual thinking 
and to create insight in and overview of the spatial data. 

Feature Collective noun to indicate either a point, polyline or polygon vector 
object, when the distinction is not important. 

Filter In the context of this book, an algorithm for eliminating, reducing, 
attenuating or extracting information from raster data; see also filter¬ 
ing. 

Filtering Computational process of changing given values such that a con¬ 
tained component is either eliminated, reduced, attenuated, or ex¬ 
tracted. Examples include extracting slope information from an ele¬ 
vation raster using x— and y-gradient filters, or extracting boundaries 
from polygons represented as rasters. 
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Foundation data see base data'. 

Geo-webservices Software programs that act as an intermediate between web 
users and geographic data (bases). These can vary from a simple map 
display service to a service which involves complex spatial calcula¬ 
tions. 

Geographic dimension Spatial phenomena exist in space and time. The ge¬ 
ographic dimension is the space factor in this existence, and deter¬ 
mines where the phenomenon is present. 

Geographic f ieid A geographic phenomenon that can be viewed as a—usually 
continuous—function in the geographic space that associates with 
each location a value. Continuous examples are elevation or depth, 
temperature, humidity, fertility, pH et cetera. Discrete examples are 
land use classifications, and soil classifications. 

Geographic information (also: 'Geoinformation'). Information derived from 
spatial data. Strictly speaking, information is derived by humans us¬ 
ing mental processes, so geographic information too is made of men¬ 
tal 'matter' only. Day-to-day use of the term, however, allows us to 
exchange it with 'spatial data'. 

Geographic Information System A software package that accommodates the 
entry, management, analysis and presentation of georeferenced data. 
It is a generic tool applicable to many different types of use (GIS ap¬ 
plications). 

Geographic phenomenon Any man-made or natural phenomenon (that we 
are interested in). 
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Geographic space Space in which locations are defined relative to the Earth's 
surface. The usual space that GIS applications work with. 

Georeferenced Data is georeferenced when coordinates from a geographic 
space have been associated with it. The georeference (spatial refer¬ 
ence) tells us where the object represented by the data is, was or will 
be; an abbreviation of 'geographically referenced'. 

Geospatial data Data related to locations on (the surface of) the Earth. In this 
book, usually abbreviated to 'spatial data'. 

Geovisualization Making spatial data 'visible' by means of maps generated 
through interactive and dynamic software tools. 

GIS application Software specifically developed to support the study of geo¬ 
graphic pheneomena in some application domain in a specific project. 
A spatial data set as stored in a GIS, together with functions on the 
data. Serves a well-defined purpose, making use of GIS functional¬ 
ity. Distinguished from the software—the GIS package, the database 
package—that can be applied generically. 

Global Positioning System The American satellite-based positioning system. 

More generally, satellite surveying method providing accurate geode¬ 
tic coordinates for any point on the Earth at any time. 

Granularity The level of detail with which something is represented. 

Grid A network of regularly spaced horizontal and perpendicular lines (as 
for locating points on a map). We may associate (field) values with 
the nodes of the grid. In contrast to a raster, the associated values 
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represent point values, not cell values. This subtlety is often—and 
can often be—glossed over, especially when point distances are small 
relative to the variation in the represented phenomenon. By default a 
grid is two-dimensional. 

Ground Control Point (GCP). A ground point reliably identifiable in the im- 
age(s) under consideration. It has known coordinates in a map or 
terrain coordinate system, expressed in the units (eg, meters, feet) of 
the specified coordinate system. GCPs are used for georeferencing 
and image orientation. 

Image In the context of this book, this term refers to raw data produced 
by an electronic sensor, which are not pictorial, but arrays of digi¬ 
tal numbers related to some property of an object or scene, such as 
the amount of reflected light. An image may comprise any number of 
bands. When the reflectance values have been translated into some 
'thematic' variable we refer to it as a raster. An image consists of pix¬ 
els, whereas a raster is composed of cells. 

Integer Any 'whole' number in the set {..., —2, —1,0,1,2,..computers 
caimot represent arbitrarily large numbers, and some maximum (and 
minimum) integer is usually indicated. 

Interpolation (From Latin interpolire, putting in between). Estimating the value 
of a (continuous) variable that is given by n sampled values at some 
intermediate point or instant. See 'spatial interpolation'. 

Interval data Data values that have some natural ordering amongst them, and 
that allow simple forms of arithmetic computations like addition and 
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subtraction, but not multiplication or division. Temperature mea¬ 
sured in centigrades is an example. 

Isoline A line in the map of a spatial field that identifies all locations with 
the same field value. This value should be used as tag of the line, or 
should be derivable from tags of other lines. 

Kernel In the context of this book, a window of a given size and shape used 
in 'moving-window' operations on point data, or a neighbourhood of n 
by m cells used in operations on raster data. See 'filter'. 

Latitude/Longitude The coordinate components of a spherical coordinate sys¬ 
tem, referred to as geographic coordinates. The latitude is zero on the 
equator and increases towards the two poles to a maximum absolute 
value of 90°. The longitude is counted from the Greenwich meridian 
positively eastwards to the maximum of 180°. 

Least-squares adjustment A method of correcting observations in which the 
sum of the squares of all the residuals derived by fitting the obser¬ 
vations to a mathematical model is minimised. Least squares adjust¬ 
ment is based on probability theory and requires a (large) number of 
redundant measurements. 

Line A computer representation of a geographic object that is perceived 
as a one-dimensional, i.e. curvilinear entity. The line determines two 
end nodes plus a, possibly empty, list of internal points, known as 
vertices. Other words for 'line' are polyline (emphasising the multiple 
linear segments), arc or edge. 
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Man-made phenomenon An object, occurrence or event that was created by 
humans. This is a difficult to define and large population of entities: 
anything that can be georeferenced and originates from man can be a 
'man-made phenomenon'. 

Map A simplified, purpose-specific graphical representation of geographic 
phenomena, usualy on a planar display. Defined in [7] as "A tool for 
presenting geographic information in a way that is visual, digital or 
tactile.". 

Map coordinate system A system of expressing the position of a point on the 
Earth's surface by planar rectangular coordinates using a particular 
map projection, such as UTM, the Lambert's conical projection, or an 
azimuthal stereographic projection (as used in the Netherlands). 

Map generalization The meaningful reduction of map content to accommo¬ 
date scale decrease. 

Map projection The functional mapping of a curved horizontal reference sur¬ 
face onto a flat 2D plane, using mathematical equations. 

Map scale The ratio of distance on the map to the corresponding horizontal 
distance in 'real world' units. The ratio is commonly expressed as 
1 : m, where m is the scale factor (e.g. 1:25,000). 

Multi-path error A ranging error that occurs when the satellite signal is re¬ 
ceived multiple times and these receptions interfere. This is normally 
caused by reflections off objects. 
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Natural phenomenon An object, occurrence or event that originated naturally. 

This is a difficult to define and large population: see also 'man-made 
phenomenon' as a contrast. 

Nominal data Data values that serve to identify or name something, but that 
do not allow arithmetic computations; sometimes also called categor¬ 
ical data when the values are sorted according to some set of non¬ 
overlapping categories. 

Oblate ellipsoid The solid (i.e. a three-dimensional object) produced by ro¬ 
tating an ellipse (i.e. a two-dimensional object) about its minor axis. 
It is also known as spheroid, because it resembles a sphere flattened 
(squashed) at the poles. 

Orbit The path followed by one body (e.g. a satellite) in its revolution about 
another (e.g. the Earth). 

Ordinal data Data values that serve to identify or name something, and for 
which some natural ordering of the values exists. No arithmetic is 
possible on these data values. 

Polygon A computer representation of a geographic object that is perceived as 
a two-dimensional, i.e. area entity. The polygon is determined by a 
closed line that describes its boundary. Because a line is a piece-wise 
straight entity, a polygon is only a finite approximation of the actual 
area. 

Polyhedron A solid bounded by planar facets, i.e. a three-dimensional feature 
of which the sides are flat surfaces. The fundamental polyhedron of the 
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ITRF is a mesh of foundation stations around the globe that are used 
to define the ITRS. 

Presentation cartography Cartographic visualization of spatial data for pre¬ 
sentation to a group of users (public visual communication). 

Pseudorange In satellite-based positioning, a distance measurement obtained 
by a receiver from a satellite's signal. Uncorrected for clock bias. 

Ranging error In satellite-based positioning, the error made when a receiver 
determines the distance to a satellite. 

Raster A set of regularly spaced (and contiguous) cells with associated (field) 
values. In contrast to a grid, the associated values represent cell val¬ 
ues, not point values. This means that the value for a cell is assumed 
to be vald for all locations within the cell. This subtlety is often—and 
can often be—glossed over, especially when the cell size is small rel¬ 
ative to the variation in the represented phenomenon. By default a 
raster is two-dimensional. 

Ratio data Data values that allow most, if not all, forms of arithmetic com¬ 
putation, including multiplication, division, and interpolation. Typi¬ 
cally used for cell values in raster representations of continuous fields. 

Relative positioning (Also: differential positioning) Determination of posi¬ 
tion using another receiver with acuurately known position that is 
tracking the same satellite signals. 


previous 

next 


back 

exit 


contents 

index 

glossary 

web links 

bibliography 

about 





















Glossary 


523 


Sampling Selecting a representative part of a population for statistical anal¬ 
ysis; to this end various strategies can be applied, such as random 
sampling, systematic sampling, stratified sampling, etc. 

Satellite A manufactured vehicle intended to orbit the earth, or another celes¬ 
tial body 

Simplex A primitive spatial feature as recognized in topology A 0-simplex is 
a point, 1-simplex an arc, a 2-simplex an area and a 3-simplex a body 
See simplicial complex. 

Simplicial complex A combination, i.e. spatial arrangement, of a number of 
simplices, possibly of different dimension. 

Solid A true three-dimensional object. 

Space segment The constellation of satellites that can be used for positioning. 

Spatial autocorrelation The principle that locations which are closer together 
are more likely to have similar values than locations that are far apart. 
Often referred to as Tobler's first law of Geography. 

Spatial data In the broad sense, any data with which position is associated. 
See geospatial data. 

Spatial Data Infrastructure (SDI); The relevant base collection of technolo¬ 
gies, policies and institutional arrangements that facilitate the avail¬ 
ability of and access to spatial data. 
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Spatial data layer A collection of data items that belong together, and that can 
be spatially interpreted. A raster is a spatial data layer, and so are 
a collection of polygons, a collection of polylines, or a collection of 
point features. Principles of correct data organization dictate that the 
raster's cells (or the polygons, polylines or points) represent phenom¬ 
ena of the same kind. 

Spatial database A database that allows users to store, query and manipulate 
collections of georeferenced data. 

Spatial interpolation Any technique that allows to infer some unknown prop¬ 
erty value of a spatial phenomenon from values for the same property 
of nearby spatial phenomena. The underlying principle is that nearby 
things are most likely rather similar. Many spatial interpolation tech¬ 
niques exist. 

Spatial Reference Frame (SRF): A physical realization of a spatial reference 
system, consisting of real point objects (ground stations) with their 
coordinates in the used SRS. In fact, next to the coordinates for each 
object also of the object's motion in time, due to tectonic plate move¬ 
ment, is recorded. 

Spatial Reference System (SRS): A 3D reference coordinate system with well- 
defined origin and orientation of the coordinate axes. A mathematical 
system. 

Spatial relationship A mathematically defined relationship between two sim- 
plicial complices (objects), usually defining whether they are disjoint. 
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meet, overlap et cetera. Spatial relationships are the object of study in 
topology. 

Sphere The solid (i.e. a three-dimensional object) produced by rotating a cir¬ 
cle. 

Static map Fixed map (e.g. a paper map, possibly scaimed for dissemination 
through the World Wide Web) of which the contents and/or their car¬ 
tographic representation carmot be changed by the user. 

Stochastic (In the context of an application model), a random or probability- 
based model component that generates different results from some 
initial value, depending on the probability function over time. The 
opposite of deterministic approaches or models. Used when we do 
not know the exact functioning of a process. 

String Any sequence of characters chosen from the alphabet plus a set of 
other characters like interpunction symbols ('?', '!', et cetera) and 
numbers. When typed to a computer, a string is usually surrounded 
by a pair of double quotes. 

Temporai dimension Spatial phenomena exist in space and time. The tempo¬ 
ral dimension is the time factor in this existence, and represents when 
the phenomenon is present. 

Tesseiiation (Also known as 'tiling'); a partition of space into mutually dis¬ 
joint cells that together form the complete study area. A raster is a 
regular tessellation example, meaning that its constituent cells have 
the same shape and size. In irregular tessellations, the cells differ in 
shape and/or in size. 
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Thematic map A map in which the distribution, quality and/or quantity of a 
phenomenon (or the relationship among several phenomena) is pre¬ 
sented on a topographic base. 

Thiessen polygons A partitioning of the plane using a given set of points 
and resulting in a set of polygons. Each polygon contains just one 
point and is the area defined by those locations that are closest to this 
point, and not another point in the input set. There is a natural corre¬ 
spondence with the Delaunay triangulation obtained from the same 
points. 

Topographic map A map that gives a general, realistic and complete, but sim¬ 
plified representation to scale of the terrain (roads, rivers, buildings 
and settlements, vegetation, relief, geographical names, et cetera.). 

Topological consistency The set of rules that determines what are valid spa¬ 
tial arrangements of simplicial complices in a spatial data represen¬ 
tation. A typical rule is for instance that each 1-simplex must be 
bounded by two 0-simplices, which are its end nodes. 

Topology Topology refers to the spatial relationships between geographical 
elements in a data set that do not change under a continuous trans¬ 
formation. 

Trend surface A 2D curved surface that is fitted through a number of point 
measurements, as an approximation of the continuous field that is 
measured. 

Triangulated Irregular Network (TIN); a data structure that allows to repre¬ 
sent a continuous spatial field through a finite set of {location, value) 
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pairs and triangles made from them. Commonly in use as digital ter¬ 
rain model, but can be used for geographic fields other than elevation. 

Triangulation A complete partition of the study space into mutually non-over- 
lapping triangles, usually on the basis of georeferenced measurements. 

Tuple A record or row in a database table; it will have several attribute val¬ 
ues. Pronounced as 'tapl'. 

User segment The community of users and their satellite receivers, in satellite- 
based positioing. 

Visual variable (Also: graphic variable); an elementary way in which graphic 
symbols are distinguished from each other. Commonly, the follow¬ 
ing six visual variables are recognized: size, (lightness) value, texture, 
colour, orientation and shape. 

Web portal World wide web based entrance to a spatial data clearinghouse. 
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accumulated flow count raster, 393 
accuracy, 41, 275, 277-290, 302, 303 
attribute, 288 
location, 279 
positional, 277 
temporal, 290 
animated map, 472 
application model, 40,414 
area object, 88 
area size, 341, 344 

attribute, 76, 114, 119, 132, 143, 152, 
154,156, 288, 298, 348-351 
autocorrelation 
spatial, 73 

base data, 304 
boundary, 88, 89, 94, 97 
crisp, 71 
fuzzy, 71 

boundary model, 89 


buffer, 336 

buffer zone, 336, 385, 387,423,427 

cartographic generalization, 334 
cartographic grammar, 437, 458 
cartography, 430-482 
categorical data, 65 
cellular automata, 416 
centroid, 77, 341 
change detection, 118 
classification, 358-364 
automatic, 363 
equal frequency, 363 
equal interval, 363 
user-controlled, 361 
classification operator, 334 
classification parameter, 358 
clearinghouse, 271 
conformal map projection, 213 
coimectivity, 87,112, 337,405-413 
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consistency 

temporal, 290 
topological, 99 
contour line, 107, 337 
contour map, 467 
control point, 265 
control segment, 227 
coordinate systems 
planar, 196 
spatial, 196 

coordinate thinning, 309 
Coordinated Universal Time, 234 
coupling 

embedded, 418 
loose, 418 
tight, 418 

data 

3D, 101 

georeferenced, 35 
geospatial, 35 
spatial, 35, 72-115 
spatiotemporal, 116-121 
thematic, 85 

data layer, 88,114, 312, 335, 361, 366, 
386,421,437 
data preparation, 294 
data quality, 35, 274,421 


data standards, 36, 273 
database, 39,43 
geo-, 45 
spatial, 45-175 
datum 

global, 192 
local, 190 

datum transformation, 221-225 
deductive approaches, 417 
Delaunay triangulation, 388 
diffusion computation, 384, 392 
diffusion function, 390 
digitizing, 265-269 
automatic, 265 
manual, 265 
semi-automatic, 265 
dilution of precision, 242 
dimension 

geographic, 21 
spatial, 21 
temporal, 21 
dissolve, 295 
distance, 344 
dynamic map, 471-473 

edge matching, 307 
ellipsoid, 186 
embedded coupling, 418 
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equidistant map projection, 213 
equivalent map projection, 213 
error, 106 

propagation, 419, 420 
Euclidean plane, 57 

facet, 102 
field 

continuous, 60, 62,77,82,105,141, 
310, 312, 315, 316 
differentiable, 62 
discrete, 60, 62,141, 310, 312, 313 
geographic, 59, 62-63 
filter, 399 
filtering, 399 

flow computation, 384, 393 
flow direction raster, 393 

Galileo, 253 
generalization 

cartographic, 436 
geo-webservices, 137, 271, 274 
geodatabase, 168,172,175 
geographic information science, see 
GIS, see GIS 

geographic information system, see GIS 
Geoid, 183 
geoinformatics, 33 


geoinformation, 72 
geometric transformation, 308 
georeferenced, 29, 33, 35, 57, 58 
geostatistics, 327 
GIS, 16, 21-23, 33-34,130 
definition of, 22 
GLONASS, 252 
GPS, 249-251 
grid, 76 

height, 186-467 
orthometric, 184 
hillshading, 61, 396 
horizontal datum, 186 
hypsometric tinting, 467 

image, 262 

inductive apporaches, 417 
information 

geographic, 16, 33, 35-37 
interior, 94, 97 

International Terrestrial Reference Frame, 
193, 226 

International Terrestrial Reference Sys¬ 
tem, 192, 254 

interpolation, 28, 29, 49, 72, 77, 84, 

106, 310, 311, 315, 321, 327, 

336 
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IDW, 322 
trend surface, 316 
interval data, 65,454 
inverse distance weighting, 322 
isoline, 32,107, 312, 320, 321 

join condition, 163 

kernel, 399 
kriging, 327 

large-scale, 103 
latitude, 197 
length 

of polyline, 341, 344 
levelling 

geodetic, 184 
line object, 86 
line segment, 86 
lineage, 291 

local resistance raster, 390 
location, 341, 344 
object, 67 
longitude, 197 
loose coupling, 418 

map, 431-438,456-482 
large-scale, 435 
small-scale, 435 


thematic, 437 
topographic, 436, 437 
map algebra, 371 
map generalization, 444 
map grid, 202 
map legend, 475 
map output, 480-482 
map projection, 207-308 
changing, 219 
map scale, 41,103,435 
map theme 

physical, 437, 440 
socio-economic, 437, 440 
map title, 475 
mapping equation 
forward, 208 
inverse, 209 
mean sea level, 183 
measurement, 339-344 
metadata, 36, 272, 291,475 
metric, 94 

minimal bounding box, 342 
minimal cost path, 391 
model, 53, 333 

agent-based, 416 
aggregate, 416 
application, 414 
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dynamic, 416 
individual, 416 
process, 416 
static, 416 

model generalization, 334 
modelling, 39, 53,414 
moving window averaging, 322 
multi-path reception, 240 
multi-representation spatial data, 305 
multi-scale spatial data, 304 

NDVI, 374 

neighbourhood function, 336,382-393 
network allocation, 410-411,413 
network analysis, 337,405-413 
network direction, 405 
network function, 337 
network partitioning, 406, 410 
network trace analysis, 411-413 
Nina, La, 20, 31 
Nina, La, 19 

Nino, El, 19-21, 24, 25, 31, 38, 40, 43, 
44,49 

nominal data, 65,454 
normal map projection, 212 

object 

geographic, 60, 67-70,109 


oblique map projection, 212 
optimal path finding, 407-408 
ordinal data, 65,455 
orientation 
object, 67 

overlay function, 335, 366-380 
on raster data, 371-380 
on vector data, 367-369 
overshoot, 295 

phenomenon 
dynamic, 116 

geographic, 19, 21,45, 57, 59-70 
pixel, 263 
point object, 85 

polygon clipping operator, 368 
polygon intersection, 367 
polygon overwrite operator, 368 
polyhedron, 102 
positional fix, 229 
positioning 

2D and 3D, 231 
absolute, 228 
network, 246 
relative, 244 
satellite-based, 226-256 
precision, 275 
primary data, 262 
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proximity function, 384-388 
pseudorange, 228, 237, 239, 240 
Pythagorean distance, 341 

quadtree, 78,105 
qualitative data, 454 
quantitative data, 454,464 
query, 154 

raster, 75,105, 262 
raster calculus, 371 
raster cell, 263 
raster resolution, 325 
rasterization, 299 
ratio data, 66,454 
reclassification, 358 
redundancy 
data, 89 

reference surface, 182 
regression, 316 
relation, 152,154,156 
relational data model, 154-164 
relationship 

topological, 97 
resolution, 29 
retrieval operator, 334 
root mean square error, 279 

SDI, 136, 271,481 


SDSS, 145 

search window, 336 
secondary data, 264 
selected object, 346 
selection object, 346 
selective availability, 237 
shape 

object, 67 
simplex, 96 

simplicial complex, 96,99 
size 

object, 67 

sliver polygon, 304 
slope, 336 

slope angle, 396,401 
slope aspect, 396,402,403 
slope convexity, 396 
slope gradient, 401 
small-scale, 103 
solid, 101 
space, 57 

Euclidean, 57, 94 
geographic, 16,17,45,49 
metric, 94 
topological, 94 
space segment, 227 
spatial aggregation, 359 
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spatial analysis, 46 

spatial autocorrelation, 73,79,323,327 
spatial data, 17 

spatial data infrastructure, 136 
spatial dissolving, 359 
spatial information theory, 132 
spatial join, 368 
spatial merging, 359 
spatial reference system, 29 
spatial selection, 345-357 
interactive, 346 
using distance, 353 
using topology, 352 
spatio-temporal, 21 
standards, 136 
static map, 472, 481 
surface 

secant, 210 

tangent surface, 210 
tessellation, 74-79 
irregular, 78 
regular, 75 

Thiessen polygon, 313, 388 
tie point, 265 
tight coupling, 418 
time 

concepts of, 117 
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representing in GIS, 117 
TIN, 82,106 
tolerance, 283 
topological mapping, 93 
topology, 174, 301 
spatial, 91-99 
transformations 
coordinate, 217 

transverse map projection, 212 
trend surface, 320 
triangulation, 83, 321 
Delaunay, 84 
trilateration, 229 
tuple, 152,154,156 
turning cost table, 407 

undershoot, 295 
user segment, 227 

vector, 74,109 

vectorization, 265, 267, 299, 312 
vertical datum, 184 
visibility function, 337 
visual hierarchy, 476 
visual interpretation, 266 
visual variable, 456 
visualization, 430-482 

web portal, 271 
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WWW, 480 
x-gradient filter, 402 
?/-gradient filter, 402 
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Internet sites 
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General GIS sites 

• The Open GIS Consortium (OGC) homepage 

• GIS dot com, ESRI site 

• Institut Geographique National, France 

• United States Geological Survey (USGS) 

• Harvard University list of GIS sites 
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• The European INSPIRE web portal 

• The Dutch National Atlas online 

• National Geographic's Maps and Geography pages 

• Digital Ghart of the World, at Pennsylvania State University, U.S. A. 

• Pennsylvania State University Libraries, Maps Library 

• Ordnance Survey, United Kingdom 

• United States "Geospatial One Stop" web portal 

• United States Geological Survey (USGS) National Geologic Map Database 

• ESRTs Data Repository 

• Worldwide National Statistical Offices 

• University of Texas at Austin, On-line Map Gallery 

• Refdesk dot com on Atlases and Maps 
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• Geometric Aspects of Mapping; Division of Cartography, ITC 

• Active GPS Reference System for the Netherlands (AGRS.NL) 

• ITRF homepage 

• SAtelliten POsitionierung System (SAPOS) 

• GeodlS page, maintained by Deutsches Geodatisches Forschungsinstitut 
(DGFI), Geodetic Reference System 1980 (GRS80) 

• International Earth Rotation and Reference Systems service 

• Office of the Surveyor General of Land Information New Zealand (LINZ), 
guide to datums, projections and heights 

• Ordnance Survey of Great Britain. A Guide to Coordinate Systems in Great 
Britain 
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• The Open GIS Consortium "Learning Resources" page 

• ColorBrewer, a useful online guide to using colour in maps and graphics 

• FreeGIS.org - free GIS software and data 

• The Generic Mapping Tools site 

• The Geographer's Craft GIS notes 

• NCGIA Core Curriculum in GIScience 
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